agnes-the-ai-analyst

Author	SHA1	Message	Date
Vojtech	a46b9dc928	/home install-hero polish: license link contrast, auto-mode reorder, Shift+Tab guidance (#243 ) * Make /home install-hero links readable against blue background The Claude license-options link added in the previous commit inherited the default `<a>` style (`var(--hp-primary)` blue), which renders as blue-on-blue and is unreadable inside the blue install-hero. Add a scoped `.install-hero a` rule that uses white with an underline (matching the existing lead-paragraph contrast pattern) so any link nested in the hero stays legible. * Reorder /home install flow: auto-mode is now Step 2, Agnes install becomes Step 3 Step 3 (was Step 2) pastes a ~20-command bash bootstrap into a fresh Claude Code session. Without auto-mode enabled first, each Bash/edit command needs a manual approve click — bad UX for first-time users. Move auto-mode from the outside-hero `<details>` reference block into the install-hero as a real Step 2, between "install Claude Code" and "install Agnes". Content is the persistent `acceptEdits` snippet (write to ~/.claude/settings.json) plus a one-liner pointing at Shift+Tab for users who are already inside a running Claude Code session. YOLO mode for full Bash auto-approve stays on /setup-advanced behind the existing link. The outside-hero `setup-collapsible[data-section="step3"]` block is dropped — auto-mode is no longer reference content, it's a real install step, and duplicating it would just diverge over time. Onboarded users no longer see the auto-mode block at all (consistent with Steps 1 + 3 also hiding post-onboarding). Completion banner copy updated: "Step 1, 2 & 3 done — Claude Code installed, auto-mode set, Agnes ready". Dashboard CTA partial and other templates don't reference step numbers for this flow, so no adaptation needed there. * Simplify /home Step 2 to Shift+Tab only — drop the JSON snippet Operator pointed out two issues with the prior Step 2: 1. The settings.json snippet is redundant. Claude Code's first Shift+Tab cycle to auto-accept mode already prompts the user whether to persist it as default — Claude writes the config itself, no manual file edit needed. 2. The snippet only showed the POSIX path `~/.claude/settings.json`, which doesn't translate to native Windows. Replace the snippet + copy button with a plain Shift+Tab instruction, explicitly call out the first-time "make this the default?" prompt, and note that Claude handles the config write itself — same flow on macOS / Linux / WSL / Windows. Adds a fallback line for users who already closed the post-OAuth session. * Tighten /home Step 2 install-note to two paragraphs Operator: drop the 'Claude writes the setting itself, so this works the same on macOS / Linux / WSL / Windows...' line plus the 'auto-approves file edits going forward; Bash commands stay gated — that's the safe default' line. Both were filler — the make-default prompt already implies persistence, and gated Bash is the obvious default users won't be surprised by. Result: paragraph 1 carries Shift+Tab + first-time make-default say-yes + closed-session fallback in one breath; paragraph 2 keeps the verbatim YOLO link. Same affordances, less vertical space.	2026-05-11 16:46:58 +00:00
Vojtech	d6ad08f107	Flea-market upload guardrails + soft delete + JOIN-based admin queue (#233 ) * feat(store): flea-market upload guardrails + soft delete + JOIN-based admin queue Adds an end-to-end guardrails pipeline for store uploads (manifest + static-security + LLM review), persists blocked bundles for forensics, introduces soft-delete (Archive) semantics, consolidates the legacy /store/{id} surface into /marketplace/flea/{id}, and reworks the admin queue so lifecycle filters read live entity visibility via LEFT JOIN rather than a denormalized submission column. Schema v29 → v35: * v29 store_submissions table + store_entities.visibility_status * v30 file_size, bundle_sha256, bundle_purged_at on submissions * v31 reshape store_submissions (drop legacy unique on entity_id) * v32 store_entities.archived_at/by + 'archived' visibility value * v33 drop store_submissions.retry_count (unused) * v34 ensure idx_store_submissions_entity exists post column-drop * v35 broaden visibility_status enum + JOIN architecture cutover Pipeline (src/store_guardrails/): * Inline checks: manifest_check, static_scan, quality_check * LLM review configurable haiku\|sonnet\|opus (default haiku) * BackgroundTasks-driven async path with structured-output JSON * Per-submitter daily quota (default 50) * 30-day TTL purge job (POST /api/admin/run-blocked-purge) * Bundle SHA256 + size persisted; sha256 survives purge for forensics Visibility model: * pending \| approved \| hidden \| archived * _enforce_visibility returns 404 (no leak) for non-owner non-admin * Owner sees own non-approved entries via include_owner_id widening * Install refused with 409 entity_not_approved when not approved Soft-delete (DELETE /api/store/entities/{id}): * Default = soft (visibility_status='archived'); existing installs keep getting served the bundle so users don't lose the plugin * ?hard=true admin-only: drops bundle + cascades user_store_installs * Hard-delete preserves entity_id on submission as tombstone so audit_log linkage survives for the activity timeline Admin queue lifecycle (the JOIN refactor): * Verdict (store_submissions.status) is immutable forensic record * Lifecycle (store_entities.visibility_status) is live state * /admin/store/submissions Archived chip translates to `e.visibility_status='archived'` via LEFT JOIN — any path that flips visibility surfaces in the queue immediately * Detail page renders Status (verdict) and Entity lifecycle side by side so admins see "approved at review, now archived" at a glance URL consolidation: * /store/{id} deleted (no redirect, stale bookmarks 404) * /marketplace/flea/{id} is the canonical detail surface * Three in-tree callers (upload-success, my-stack card, store listing card) updated to point at the new URL * Quarantine banner extracted to _quarantine_banner.html partial, self-guarded, included from both flea detail templates * Banner JS auto-refreshes when the verdict lands by polling /api/marketplace/flea/{id}/detail (visibility_status + submission_status — the latter is needed because blocked_llm keeps the entity at visibility_status='pending') Audit log resource format: * runner.py emits prefixed `store_submission:{id}` (post-fix) * Detail-page timeline query handles three patterns: prefixed submission, helper-emitted `store_entity:{sub_id}`, and bare-id legacy rows — all surface in the activity timeline UX fixes: * Owner sees Under review / Quarantined / Hidden banner with status * Install button gray-disabled (not blue) when non-approved * Owner cannot delete quarantined entries (403); admin can * Admin queue: filter chips, sortable columns, paging, page-size * Auto-refresh queue every 5s while pending rows are visible * Store upload page file picker no longer opens twice (label → input default action collided with explicit JS handler) Tests: 168 passed across the guardrails suites (admin submissions, store API, inline / LLM / purge guardrails, store repositories, marketplace filter, schema version). New regression coverage includes: archive surfaces via JOIN even when API path is bypassed; deleted submission renders activity timeline (tombstone); flea detail surfaces submission_status only for owner/admin; detail page renders Entity lifecycle row; audit log resource format covers both helper and runner paths. * fix(store-guardrails): PR #233 follow-up — prompt injection, atomic PUT, BG race, schema, reaper, sort whitelist Addresses 9 of the 23 findings from the PR #233 review (spec at docs/superpowers/specs/2026-05-09-pr233-guardrails-fixes-spec.md). Merge-gate items #1-#6 plus high-value mediums #7, #9-#12, #23. Architectural items (#8 enum split, #14 factory) and pure maintainability (#15-#22) deferred to follow-ups. Security: * #1 prompt injection — SYSTEM_PROMPT now passed via the SDK's dedicated system= parameter; bundle wrapped in <bundle>...</bundle> sentinels declared data-only by the system prompt; literal sentinel strings in user content are escaped so an adversarial README can't forge a close tag. * #6 static scan honesty — module docstring + admin copy + docs declare static scan as signal not gate; .md/.txt/.rst/.html/.json/ .yaml/.yml/.toml skipped to avoid false positives on prose. AST mode for Python deferred (separate flag, FP comparison work). Correctness: * #2 PUT atomicity — bundles bake into plugin.staging-<rand>/ alongside live, atomic-rename on success; failed checks leave live tree byte-for-byte intact. * #3 BG-task race — set_visibility_if_pending guards verdict flips to the (pending, hidden) review window; admin archives during review survive; skipped flips audit-logged. * #4 v35 NOT NULL/DEFAULT — schema v35→v36 re-applies them on store_entities.visibility_status. CHECK constraint enforced application-side (DuckDB ADD CHECK on existing column unsupported). * #7 stuck-review reaper — reap_stuck_llm_reviews flips pending_llm rows older than guardrails.stuck_review_grace_seconds (default 1800) to review_error. Scheduler runs every 15 min via new /api/admin/run-reap-stuck-reviews. Set knob to 0 to disable. * #9 quota counter — count_blocked_for_submitter_since now counts blocked_inline + blocked_llm + review_error so a submitter triggering only LLM-blocked verdicts is bounded. * #10 missing risk_level — surfaces as review_error with error='missing_risk_level' instead of silently defaulting to 'medium' (which looked like a model-decided block). * #11 archived_at clear — set_visibility nulls archived_at + archived_by when transitioning out of 'archived' so a future read doesn't show stale archive forensics on an approved row. Maintainability: * #12 FSM doc comment — accurate insert/transition/lifecycle description in src/db.py near store_submissions schema. * #23 sort-key whitelist — admin queue rejects unknown sort keys with 400 invalid_sort_key; substring-replace footgun removed. Deferred (separate PRs): * #5 quota race — proper fix requires asyncio.Lock spanning the full pipeline; threading.Lock blocks event loop, DuckDB MVCC doesn't help. API-level slowapi bounds worst case for now. * #6 part 3 (AST static scan), #8 (enum split), #13 (import bundle docs), #14 (factory consolidation), #15-#22 (maint). Tests: * New: tests/test_store_guardrails_prompt_injection.py (corpus + trust-boundary invariants), tests/test_store_put_atomic.py, tests/test_store_guardrails_reaper.py. * Extended: test_store_guardrails_llm.py (system param, missing risk_level, BG race), test_admin_store_submissions.py (quota counter widening, sort whitelist 400), test_store_repositories.py (un-archive metadata clear), test_db_schema_version.py (v36). * Full suite: 3738 passed; 17 pre-existing baseline failures unchanged (db migration tests, cli binary rename, catalog export, user mgmt v5 backfill — confirmed by stash + rerun on clean tree).	2026-05-09 17:32:53 +04:00
Vojtech	2e2e1a1eca	feat(home): state-aware /home + /setup-advanced + schema v26 (#228 ) * feat(home+news): state-aware /home + /news + admin-edited news section Squash of the vr/home-page feature work for clean rebase onto main. Original 18-commit history preserved in branch backup/vr-home-page-pre-rebase. What's in this PR: State-aware /home page - New `/home` route with hero + auto-mode + connectors (Asana / GWS / Atlassian) + lookarounds. Onboarded vs not-onboarded state-machine branches a single template (`home_not_onboarded.html`); the install steps, "Setup a new Claude Code" CTA (90-day PAT mint), and per- connector setup prompts hide once `users.onboarded=TRUE`. A completion badge replaces them. - "Mark me as offboarded" button reverses the flag without an SQL UPDATE. - `users.onboarded BOOLEAN` column added; default FALSE; flipped by the CLI's `agnes init` post-success POST and the `/admin/users` API. - Connector setup prompts pre-check whether the tool is already installed/connected before re-running setup. - GWS scope set widened to include Google Chat (`chat.spaces`, `chat.messages`). Single template + design tokens - `dashboard.html` now extends `base.html` via the new `{% block layout %}` opt-out (full-width pages skip the 800px `.container`). Net: every page shares one shell. - `style-custom.css` `:root` extended with `--space-{7,9,10,12}`, `--radius-2xl`, `--shadow-{card,elevated}`, `--text-{muted,disabled}`, `--focus-ring`, `--transition-`, `--width-{narrow,app,wide}` so inline page styles can migrate incrementally. Auth redirects honor AGNES_HOME_ROUTE* - `safe_next_path` resolves the configured home route when no `default=` is passed; OAuth callbacks, magic-link clicks, password form, and LOCAL_DEV_MODE shortcuts now land on `/home` (or whatever the operator picked) instead of always /dashboard. News section + /news permalink + /admin/news editor - Schema-bumped `news_template` table (single versioned entity, draft + publish gate). `published BOOLEAN` distinguishes draft from public; monotonically-increasing `version` per save; rows >30d pruned on save except the currently-displayed published version. - `/home` bottom-of-page renders the latest published intro with a "Read more →" link to `/news` (which renders the full body). - `/admin/news` editor with sandboxed live preview, versions table, per-row Unpublish, Format-help cheatsheet. - `agnes admin news show / draft / edit / publish / unpublish / versions / export` (CLI). Talks to the live server via the `/api/admin/news/` endpoints (PAT-authed) — no direct DB access so it coexists with a running uvicorn. - Optimistic-lock guard: `agnes admin news publish --version N` and PUT/PATCH endpoints accept `expected_version` and 409 with structured `{error: "version_conflict", expected, actual, actual_by}` when a concurrent admin replaced the draft. Edit refuses to overwrite a draft authored by someone else without `--force` or `--expect-version`. - nh3 (Rust-backed ammonia) HTML sanitizer; iframe pre-pass strips any iframe whose src is not on the YouTube/Vimeo/Loom allowlist; javascript:/data: schemes blocked everywhere. - Author CSS vocabulary: `.news-hero` (blue gradient hero block), `.callout`/`.callout-{info,warn,success,danger}`, `.video-embed`, `.news-section`, `.news-grid-{2,3}`, `.news-cta` — all consolidated in `style-custom.css` under "News content vocabulary (shared)" so /home perex, /news body, and /admin/news preview share one source of styling. - Code-inside-`<pre>` contrast fix (was unreadable amber-on-silver). - `.news-content` table styling (border, header band, row-hover). `scripts/dev/run-local.sh`* — local uvicorn launcher. Pulls Google OAuth client id/secret from GCP Secret Manager (`AGNES_OAUTH_GCP_PROJECT`-driven, no vendor defaults), points `AGNES_CLI_DIST_DIR` at `./dist` so the wheel endpoint resolves, and `--dev` flips `LOCAL_DEV_MODE=1` + `AGNES_HOME_ROUTE=/home` for one- command iteration. `LOCAL_DEV_MODE=1` also enables the FastAPI debug toolbar. CLAUDE.md "Run tests before every push" section codifies `pytest tests/ -n auto -q` as non-negotiable before each push. Tests: 51 + 14 + 8 = 73 new tests across news-template repo, sanitizer, API, web, CLI; plus updated home/auth/template tests for the new shared-shell architecture. Origin docs (gitignored, customer-fork content): docs/brainstorms/home-page-requirements.md, docs/plans/2026-05-07-001-feat-home-page-plan.md. * feat(cli): agnes onboarded {on,off,status} — self-scoped flag toggle User-facing equivalent of the in-page "Mark me as (off)boarded" button on /home. POSTs /api/me/onboarded with {onboarded, source}; --source overrides the audit-log marker so flips made from the CLI vs the web button vs agnes init automation stay distinguishable. `status` reads via /api/me/profile (when present); falls back to a quick body-marker scan of /home so the read path doesn't write an audit_log row. PAT-authed via cli.client.api_post — same convention as agnes admin news / agnes admin add-user etc. Tests: 5 covering on/off/status round-trip, idempotency, and audit-log source recording. Full suite holds at 12 pre-existing failures (same set as before). * ui(nav+home): primary nav reorg + green What's new band + /marketplace link fix Primary nav (post-rebase audit + per-user feedback): - Items: Home → Marketplace → Data Packages → Memory. Admin dropdown for admins only. The "Dashboard" label was renamed Home — point still resolves through `home_route` so customer instances on /dashboard still land there. - Activity Center moved into the Admin dropdown. Per-team adoption analytics is admin-consumed in practice; the route still allows any authed user for direct deep-links so existing /home tile + bookmarks keep working. - Memory link added (→ /corporate-memory) — was previously buried in the /home "Look around" tiles. - Setup local agent + My Stack dropped from main nav. Setup is the /home install flow's home now; My Stack lives as a tab inside /marketplace. /home tweaks: - Plugin marketplace tile now points at /marketplace (was /store — legacy from before the marketplace rebrand landed in #230). - "What's new" section header gets a green band (success-flavored D1FAE5 background, A7F3D0 border, darker green title) so the bottom-of-page news block visibly distinguishes from the blue install-hero at the top. Header strip only — body stays white. Test fix: test_home_route_resolution renamed `dashboard_link_uses_home_route` → `home_link_uses_home_route` and asserts `href="/home">Home` instead of `href="/home">Dashboard` after the label change. * fix(home): decouple Step 3 + Connect-tools collapse from server onboarded flag The server-side `users.onboarded` flip happens through two paths: 1. Explicit user click on "Mark me as onboarded" or `agnes onboarded on`. 2. Implicit `agnes init` POST → /api/me/onboarded on success. Path 2 produced a UX surprise: an analyst running `agnes init` mid-flow reloaded /home and saw Step 3 (auto-mode) + Connect-your-tools auto- collapse to summary bars. They were actively working through those sections — the install POST never signalled "I'm done with the rest of setup", just "Agnes itself is installed". Decouple the section-collapse decision from the server flag: - Step 1 + Step 2 install blocks: still hidden on `onboarded=TRUE` (their completion is a hard server signal — Agnes IS installed). - Step 3 + Connect-your-tools: render flat by default in BOTH states. Wrapped in `<details class="setup-collapsible" open>` so the browser's native disclosure handles per-section toggle without JS, but the `<summary>` is CSS-hidden until the page-level `data-setup-minimized="1"` attribute is set on `.home-mock`. - New "Minimize setup view" toggle inside the blue install-hero, rendered only when onboarded. Click flips the data-attr on `.home-mock` AND removes the `open` attribute from each `<details>`. State persists in `localStorage["agnes_home_setup_minimized"]` so the choice survives reloads but is per-device. - "Show full setup view" (the same button when minimized) re-opens both `<details>` and clears localStorage. When minimized, each `<details>` still has its own native expand/ collapse — click the gray summary bar to peek at one section without toggling the page-level minimize off. Tests: - test_step3_and_connectors_render_flat_when_onboarded_by_default — asserts `<details class="setup-collapsible" ... open>` for both sections post-onboarding and the absence of any server-rendered `data-setup-minimized` attribute on the `.home-mock` root. - test_minimize_toggle_visible_only_when_onboarded — toggle button rendered only when onboarded. Full pytest holds at 12 pre-existing failures (same set).	2026-05-08 18:28:47 +02:00
ZdenekSrotyr	df2c33147c	fix: Devin Review on #194 round 2 — 3 BUG-class findings 1. instance.yaml overlay path now matches read site under STATE_DIR. Three sites updated: - app/api/admin.py:1005 (server-config endpoint writer) - app/api/admin.py:2610 (configure endpoint writer) - app/instance_config.py:106 (overlay reader) All three now go through _state_dir() so under flat-mount layout (STATE_DIR=/data-state) the irreplaceable instance.yaml overlay lands on the state disk (sdc) instead of the regenerable data disk (sdb). Without this fix, .env_overlay correctly went to the state disk while instance.yaml went to the data disk — config would be lost if an operator wiped sdb. 2. Strip customer-specific tokens from OSS repo per CLAUDE.md vendor-agnostic rule: - docker-compose.host-mount.yml: 'a deployer (Groupon FoundryAI)' → 'a deployer in production' - docker-compose.flat-mount.yml: 'caused 2026-05-05 in the Groupon FoundryAI deployment' → generic 'production failure mode' - docs/state-dir.md: rewrote the incident reference to describe the failure mode abstractly without naming the deployment; updated the recommendation table to say 'shadow-mount class' instead of dating the specific incident. 3. Updated docs/state-dir.md 'What reads STATE_DIR' to list all read/write sites including the three migrated in this round (admin.py, instance_config.py, marketplaces.py). ANALYSIS finding (tls-rotate.sh hardcoded host-mount.yml) deferred — same operator-side class as auto-upgrade.sh hardcoded host-mount, documented limitation per the PR body.	2026-05-05 20:02:50 +02:00
ZdenekSrotyr	9f33e24bf9	fix(config): overlay-aware LLM consumers + env-ref resolution (#179 review) Devin BUG: /api/admin/configure seeds an ai: block to the writable overlay at DATA_DIR/state/instance.yaml, but the three LLM consumers imported from config.loader.load_instance_config — which reads the static config dir only. Even if they had read the overlay, the loader ran yaml.safe_load directly without passing through _resolve_env_refs, so '${ANTHROPIC_API_KEY}' would have stayed a literal placeholder. The pipeline appeared to work because the factory falls back to the env var directly, but the overlay path itself was dead code. Two fixes, both required: 1. Switched the three LLM consumers to app.instance_config.load_instance_config: - services/corporate_memory/collector.py:collect_all - services/verification_detector/__main__.py:main - app/api/admin.py:run_verification_detector 2. app/instance_config.py runs the loaded overlay through config.loader._resolve_env_refs before the deep-merge, so '${ANTHROPIC_API_KEY}' resolves at config-load time. New regression suite tests/test_instance_config_overlay.py pins: - env-ref resolution against the overlay (resolved when env set, empty when env missing — never the literal placeholder) - deep-merge still preserves static-only sections - the three consumers reach app.instance_config (inspected via inspect.getsource so a future refactor that reverts the import fails the test) - end-to-end: a seeded overlay + ANTHROPIC_API_KEY env reaches the factory with a resolved api_key	2026-05-05 05:57:22 +02:00
ZdenekSrotyr	d055417377	feat(config): default welcome template in jinja2 + sync_interval	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	83adf01bde	fix(v2): #134 BigQuery cross-project errors return structured 502/400 + BqAccess facade (#138 ) * docs(spec): #134 unify BigQuery access behind BqAccess facade Brainstorm output for issue #134. Captures: - root cause (incl. correction of the issue's hypothesis about commit 33a9964) - BqAccess facade API + project resolution rules - error contract — typed BqAccessError mapped to HTTP 502 for upstream BQ failures, 500 for deployment/config bugs - migration plan for v2_scan, v2_sample, RemoteQueryEngine - test rewrite eliminating _bq_client_factory injection point - E2E verification protocol on agnes-development as success criterion * docs(spec): #134 revise after first review Incorporates code-reviewer findings: Must-fix: - Add v2_schema (2 copies of INSTALL/LOAD/SECRET dance) to migration scope. - Reframe v2_scan headline: missing try/except around BQ calls is the actual cause of bare 500s, not project resolution (which 33a9964 fixed). - List two more deferred call sites (extractor.py, register_bq_table) with explicit rationale. Important: - Drop billing != data clause from cross_project_forbidden heuristic; rely only on 'serviceusage' substring. billing != data is normal for cross-project setup, was over-classifying. - Split bq_bad_request into _user (400) and _server (502) variants; add sql_origin parameter to translate_bq_error so call sites declare whether SQL contains user input. - Add @functools.cache to BqAccess.from_config; document tests bypass via dependency_overrides. - Replace monkey-patched-classmethod test pattern with BqAccess(client_factory=...) injection at construction time. Cleaner than today's _bq_client_factory and 1:1 migration shape. - Keep BqProjects.data (reviewer assumed registry has source_project; it doesn't). Multi-project explicitly listed as non-goal with note. Nice-to-have: - Add 'Implementation strategy' section: 2 staged commits (bug fix alone is revertable; refactor follows). - Extend E2E protocol to cover all three endpoints, not just /sample. - Note removal of stale docstring at src/remote_query.py:204. * docs(spec): #134 revision 3 — incorporates second-round review Must-fix from second review: - v2_schema split into two migration cases: _fetch_bq_schema translates errors via translate_bq_error; _fetch_bq_table_options preserves its swallow-all 'except Exception → return {}' so /schema doesn't 502 on partition-info failures. - RemoteQueryEngine.__init__ now resolves BqAccess lazily (in _get_bq_client, not in __init__). Without this, ~7 DuckDB-only tests in test_remote_query.py would suddenly fail with not_configured. - translate_bq_error pass-through for BqAccessError is now load-bearing (clause 1, before any Google-API branch). bq.client() raises BqAccessError for bq_lib_missing/auth_failed; without explicit pass-through those fall to 'unknown' and re-raise as bare 500. - Commit 1 now emits the SAME structured response shape as commit 2 to avoid contract churn between commits. - BIGQUERY_PROJECT env-var precedence is BREAKING for env-only deployments — flagged in CHANGELOG ### Changed. Editorial: - sql_origin renamed to bad_request_status with values 'client_error' / 'upstream_error' (clearer about what the parameter actually decides). bq_bad_request_user/_server kinds collapsed to bq_bad_request (400) and bq_upstream_error (502). - CLI (cli/commands/query.py) noted as external RemoteQueryEngine caller; unaffected because new bq_access kwarg has default None. - Added unit/integration tests for the new contracts: test_translate_passes_through_BqAccessError, test_v2_scan_returns_500_on_bq_lib_missing, test_v2_schema_returns_200_with_empty_partition_on_bq_failure, test_resolve_succeeds_after_config_set. - E2E protocol now covers /schema as the fourth endpoint. - Documented functools.cache-doesn't-cache-exceptions semantics and fixture nullcontext-doesn't-close caveat for nested sessions. * docs(spec): #134 revision 4 — incorporates third-round review Third reviewer verdict: 'implementation-ready with two trivial edits'; explicitly noted prior rounds did the heavy lifting. Edits: 1. get_bq_access() module-level function instead of @classmethod @functools.cache from_config. Removes the classmethod-cache stacking footgun (different Python versions wrap differently) and gives FastAPI's dependency introspection a clean function signature. Drops the 'Do not subclass BqAccess' caveat that no longer applies. 2. Commit 1 strategy explicitly: wrap _fetch_bq_sample (v2_sample), _bq_dry_run_bytes + _run_bq_scan (v2_scan), and _fetch_bq_schema (v2_schema strict block). Do NOT touch _fetch_bq_table_options swallow-all in commit 1 — preserved as-is, then migrated (still preserved) in commit 2. All three endpoints emit the same structured body shape so client parsers see one consistent contract throughout the staged rollout. No more half-rolled-out window where /sample is bare 500 while /scan is structured 502. * docs(plan): #134 implementation plan — Phase 1 (atomic bug fix) + Phase 2 (BqAccess refactor) + Phase 3 (verification) Bite-sized TDD tasks. 3 phases, 16 tasks total: Phase 1 (Commit 1) — atomic bug fix across all four v2 endpoints: Tasks 1.1-1.5 wrap _fetch_bq_sample, _bq_dry_run_bytes, _run_bq_scan, _fetch_bq_schema with structured 502/400 try/except. _fetch_bq_table_options preserved untouched. CHANGELOG Fixed entries. Phase 2 (Commit 2) — BqAccess facade extraction + migration: Tasks 2.1-2.5 build connectors/bigquery/access.py bottom-up (BqProjects, BqAccessError, translate_bq_error, default factories, BqAccess class, get_bq_access module-level cached). Task 2.6 adds conftest.py fixture. Tasks 2.7-2.9 migrate v2_scan, v2_sample, v2_schema to BqAccess. Tasks 2.10-2.11 migrate RemoteQueryEngine + tests (lazy bq_access, drop _bq_client_factory). Task 2.12 CHANGELOG Changed BREAKING + Internal. Phase 3 — Verification: 3.1 full pytest. 3.2 squash into two PR-shape commits. 3.3 manual E2E on agnes-development per spec protocol → close #134. Self-review table maps spec sections to implementing tasks; no gaps. * fix(v2): #134 structured 502/400 on BQ errors across /scan, /scan/estimate, /sample, /schema Wraps the BigQuery call sites in v2_scan, v2_sample, and v2_schema (strict block only) with try/except for google.api_core exceptions, translating to HTTPException with a structured body shape: {error, message, details}. Fixes Pavel's report (#134) where these endpoints returned bare HTTP 500 with no body when the SA on agnes-development hit cross-project Forbidden on serviceusage.services.use. Also fixes /sample's missing billing_project fallback (the bug 33a9964 fixed for /scan never landed here). Status code split: - /scan, /scan/estimate: BadRequest -> 400 (bq_bad_request) since SQL is user-derived from req.select/where/order_by. - /sample, /schema: BadRequest -> 502 (bq_upstream_error) since SQL is server-constructed from validated identifiers. - All Forbidden -> 502 with cross_project_forbidden if 'serviceusage' in error message (with hint pointing at data_source.bigquery.billing_project), else bq_forbidden. Body shape matches what the upcoming BqAccess refactor (next commit) will produce, so client-side parsers see one consistent contract throughout the staged rollout. _fetch_bq_table_options preserved exactly as-is — its swallow-all-and-return-empty contract is intentional and survives into the refactor; /schema continues to return 200 with empty partition info when partition queries fail. Outer wraps in scan_endpoint, scan_estimate_endpoint, sample, and schema endpoints exist only to make the test pattern (monkeypatching whole _fetch_* functions) work, and are tagged TODO(#134 Phase 2) for removal once BqAccess centralizes translation. * refactor(bq): #134 BqAccess facade — unify v2_scan, v2_sample, v2_schema, RemoteQueryEngine Extracts the duplicated BigQuery-access pattern (project resolution + client construction + DuckDB-extension session + Google-API error translation) into connectors/bigquery/access.py. Migrates four call sites to use it: - app/api/v2_scan.py — _bq_dry_run_bytes, _run_bq_scan - app/api/v2_sample.py — _fetch_bq_sample - app/api/v2_schema.py — _fetch_bq_schema (strict translation), _fetch_bq_table_options (preserves swallow-all best-effort contract) - src/remote_query.py — RemoteQueryEngine, lazy bq_access kwarg The new module exposes: - BqProjects (frozen dataclass: billing + data project IDs) - BqAccessError (typed exception with HTTP_STATUS class mapping) - BqAccess (facade with injectable client_factory/duckdb_session_factory for tests; defaults call the real google-cloud-bigquery + DuckDB extension) - get_bq_access (module-level @functools.cache; FastAPI Depends target) - translate_bq_error (Google API exception → BqAccessError mapper, with BqAccessError pass-through, 'serviceusage'-substring heuristic for cross_project_forbidden, and bad_request_status param distinguishing user-derived (400) from server-constructed (502) SQL) - _default_client_factory, _default_duckdb_session_factory RemoteQueryEngine.__init__ no longer accepts _bq_client_factory; tests migrate to bq_access=BqAccess(projects, client_factory=...). DuckDB-only RemoteQueryEngine tests need no changes — bq_access defaults to None and get_bq_access() is only invoked on first BQ call (lazy resolution). BqAccessError raised internally is translated to RemoteQueryError( error_type="bq_error") in _get_bq_client to preserve the engine's existing public contract — CLI and /api/query/hybrid callers see no change. Endpoint tests (test_v2_scan, test_v2_scan_estimate, test_v2_sample, test_v2_schema) migrate from monkey-patching whole _fetch_* functions to using the new bq_access fixture in tests/conftest.py — which exercises the REAL translation path through BqAccess + translate_bq_error, closing the test gap flagged in Task 1.1's review. Side-effect behavior change: v2_sample's FROM clause now uses the data project (instance.yaml data_source.bigquery.project), not the conflated billing_project from Phase 1. Documented in CHANGELOG ### Internal. BREAKING for deployments combining BIGQUERY_PROJECT env var with data_source.bigquery.project in instance.yaml — env var now overrides data project too. See CHANGELOG ### Changed. Two known-duplicate BQ-access sites (connectors/bigquery/extractor.py, scripts/duckdb_manager.register_bq_table) explicitly out of scope; tracked as follow-up. Removed stale docstring at the previous src/remote_query.py:204 that referenced scripts.duckdb_manager._create_bq_client as the default BQ client factory (RemoteQueryEngine never actually used that function). Test counts: tests/test_bq_access.py +27 (new), tests/test_v2_.py + tests/test_remote_query.py migrated to bq_access fixture (counts unchanged or +1-2 per file). Full suite: 2086 passed, 8 pre-existing failures (DB migration tests with unrelated internal_roles DependencyException — not introduced by this PR). fix(bq_access): translate DefaultCredentialsError to BqAccessError(auth_failed) CI on PR #138 caught: bigquery.Client(...) resolves Application Default Credentials at construction time; without ADC (CI without SA key, dev laptop without 'gcloud auth application-default login') it raises google.auth.exceptions.DefaultCredentialsError synchronously. Pre-fix _default_client_factory only caught ImportError, so DefaultCredentialsError propagated as raw exception — and from production endpoints would surface as bare 500 (the exact failure mode #134 sets out to fix). Now translates to BqAccessError(kind='auth_failed', details.hint='Run gcloud auth application-default login...'). Endpoint catch chain returns HTTP 502 with structured body. Adds unit test test_raises_auth_failed_on_default_credentials_error. Third-round spec review flagged this case in passing; the fix didn't land. CI's auth-less environment surfaced it. * fix(bq_access): get_bq_access() returns sentinel instead of raising when not configured Devin BUG_0001 on PR #138 review: 'get_bq_access() as FastAPI Depends breaks all v2 endpoints for non-BigQuery instances'. Pre-fix: get_bq_access() raised BqAccessError(not_configured) when neither BIGQUERY_PROJECT env nor data_source.bigquery.project was set. Because FastAPI resolves Depends() BEFORE the endpoint body runs, this exception fires during dep-injection — the endpoint's try/except BqAccessError clause never gets a chance to catch it. Result: every v2 request on Keboola-only or CSV-only instances returned bare HTTP 500, even for local-source tables that never touch BigQuery. Fix: get_bq_access() now returns a sentinel BqAccess with empty BqProjects and factories that raise BqAccessError(not_configured) on actual use. Construction succeeds, FastAPI's dep-injection cleanly yields the sentinel, the endpoint runs. The local-source code path in build_sample / build_schema / etc. never calls bq.client() or bq.duckdb_session() (it reads parquet directly), so non-BQ tables return 200 as before. Only when an endpoint actually tries to query BQ (source_type == 'bigquery') does the sentinel raise — and the endpoint's existing except BqAccessError catches it normally, returning structured 502 with hint. Test get_bq_access::test_raises_not_configured_when_neither_set renamed and rewritten to test_returns_sentinel_when_neither_set: asserts BqAccess is returned, then asserts client() and duckdb_session() each raise BqAccessError(not_configured) on call. Test test_does_not_cache_exceptions removed (no longer applicable) and replaced with test_sentinel_is_cached_per_process documenting the operator-restart-on-config-change contract. * docs(spec+plan): #134 genericize customer-specific tokens (CLAUDE.md OSS rule) Devin BUG_0001/0002 round 3 on PR #138: spec and plan docs contained customer-specific deployment hostnames, deployment names, and a GCP project ID that violated CLAUDE.md's vendor-agnostic OSS rule ('Nothing customer-specific belongs in code, configuration defaults, comments, docs, commit messages, PR titles, or PR bodies'). Replacements: agnes-development.groupondev.com -> <your-agnes-host> agnes-development -> <your-dev-instance> prj-grp-dataview-prod-1ff9 -> <your-data-project> s1_session_landings -> <bq_table_id> E2E verification semantics unchanged — operators still run the same four curls + config flip + retry, just substituting their own host / deployment name / project / table. * fix(bq_access): hook get_bq_access.cache_clear into instance_config.reset_cache Devin ANALYSIS_0004 on PR #138: get_bq_access is @functools.cache'd at process level, so it captures BigQuery project IDs at first call and ignores subsequent instance.yaml changes. Pre-Phase-2 the v2 endpoints re-read get_value() on every request, so admin /api/admin/server-config saves (which call instance_config.reset_cache()) hot-reloaded the BQ project. Without this fix, my refactor silently regresses that contract — operators editing instance.yaml via the admin UI would see no effect on v2 endpoints until container restart. instance_config.reset_cache() now also calls connectors.bigquery.access.get_bq_access.cache_clear() (lazy import, swallowed if connectors module isn't loaded — keeps instance_config usable in isolated unit tests). Adds test_instance_config_reset_cache_invalidates_get_bq_access as regression guard. Updates CHANGELOG Internal entry to mention the hot-reload contract + the not-configured sentinel behavior (round-3 fix from Devin BUG_0001 was previously only in commit message). * fix(bq_access): surface not_configured before identifier validation + plan path genericize Devin BUG_0001 + BUG_0002 round 5 on PR #138. BUG_0001 (plan doc): personal filesystem path violated CLAUDE.md vendor-agnostic rule. Replaced with '<worktree-root>' placeholder. BUG_0002 (sentinel error path): when get_bq_access() returns the sentinel BqAccess (BQ not configured), the empty bq.projects.data was reaching validate_quoted_identifier first and raising ValueError -> endpoint mapped to HTTP 400 'unsafe_identifier' instead of structured 500 'not_configured' with hint. Each fetch helper now checks 'if not bq.projects.data: bq.client()' as the first step, which triggers the sentinel's BqAccessError(not_configured). Endpoint catches the typed error and returns HTTP 500 with hint pointing at data_source.bigquery.project. Best-effort _fetch_bq_table_options returns {} silently in this case (preserves the swallow-all contract). * fix(bq_access): classify DuckDB-native exceptions from bigquery_query() via string match Devin ANALYSIS on PR #138 review (latest round). The DuckDB bigquery extension is a C++ plugin making its own HTTP calls — when BQ returns 403, it throws duckdb.IOException with the BQ error embedded as text, not gax.Forbidden. translate_bq_error's isinstance checks would miss these, falling to case 7 → bare 500 in production for v2_scan, v2_sample, and v2_schema (the bigquery_query() paths). Fix: last-resort string-match heuristic before the re-raise. 'Forbidden' / '403' / 'Bad Request' / '400' in the lowercased message classifies via the same kind hierarchy. The 'serviceusage' substring still distinguishes cross_project_forbidden from bq_forbidden. Specific enough that random exceptions without HTTP-error keywords still re-raise. Adds 4 unit tests covering the new heuristic + the 'don't swallow random exceptions' invariant. * chore(release): cut 0.22.0 PR #138 contains issue #134 user-visible behavior changes: - BREAKING: BIGQUERY_PROJECT env var now overrides instance.yaml data_source.bigquery.project for v2 endpoints (previously RemoteQueryEngine billing only). - Fixed: structured 502/400 on /api/v2/sample, /scan, /scan/estimate, /schema when BigQuery raises Forbidden/BadRequest (was bare 500). - Internal: BqAccess facade refactor unifying four duplicate BQ-access call sites; instance_config.reset_cache() now invalidates BqAccess cache too so admin server-config saves hot-reload BQ project IDs. Bumps to 0.22.0 because PR #137 merged first and took 0.21.0.	2026-04-30 10:11:20 +02:00
ZdenekSrotyr	a222f92e70	feat(admin): server configuration editor + 0.13.0 (#107 ) Adds /admin/server-config UI for editing instance.yaml from the web. Hardening: SSRF gate on data_source URLs, narrow-overlay write strategy, atomic writes, audit log with secret masking on shape changes, threading lock on read-modify-write, corrupt-overlay refusal on write side + louder log on read side, modal Promise resolution on backdrop dismiss, sentinel scrub on save (defense-in-depth client+server). Bundles Windows PowerShell wrapper from #80. Cuts release v0.13.0.	2026-04-29 00:47:23 +02:00
ZdenekSrotyr	49f109bf73	fix: address PR review findings — config write, CalVer, error handling - Config writes to DATA_DIR/state/instance.yaml (writable) instead of CONFIG_DIR (read-only :ro in Docker) - instance_config.py checks DATA_DIR/state/ first, then falls back to CONFIG_DIR for backward compat - CalVer counter is now global across channels (-YYYY.MM.) per spec - Keboola error messages sanitized — log full error, return generic msg - chmod in secrets.py wrapped in try/except for Windows compat - Setup wizard JS handles 401 (expired JWT) with user-facing message - deploy.yml changed to workflow_dispatch only (no duplicate test runs) - Smoke test uses docker-compose.prod.yml + AGNES_TAG instead of sed - docker-compose.prod.yml uses ${AGNES_TAG:-stable} env var 663 tests pass. 8 E2E verification tests pass.	2026-04-10 13:16:40 +02:00
ZdenekSrotyr	7f523788c2	fix: correct YAML path for instance name and subtitle get_instance_name and get_instance_subtitle now look up the nested instance.name and instance.subtitle keys to match the YAML structure.	2026-04-09 16:31:56 +02:00
ZdenekSrotyr	1287e63ed9	feat: complete system — web UI, all API endpoints, governance, admin, CLI commands Major additions: - Web UI: Jinja2 templates in FastAPI (login, dashboard, catalog, corporate memory, admin) - API: catalog profiles/metrics, telegram verify/unlink/status, admin table registry CRUD - Corporate memory governance: approve/reject/mandate/revoke/edit/batch + audit log - Sync: real DataSyncManager trigger, sync-settings, table-subscriptions - CLI: setup (init/test/deploy/verify), server (logs/restart/deploy/backup), explore - Instance config integration (instance.yaml loaded at startup) - 140 tests passing (25 new)	2026-03-27 16:52:22 +01:00

11 commits