agnes-the-ai-analyst/app/instance_config.py
Vojtech 2e2e1a1eca
feat(home): state-aware /home + /setup-advanced + schema v26 (#228)
* feat(home+news): state-aware /home + /news + admin-edited news section

Squash of the vr/home-page feature work for clean rebase onto main.
Original 18-commit history preserved in branch backup/vr-home-page-pre-rebase.

What's in this PR:

**State-aware /home page**
- New `/home` route with hero + auto-mode + connectors (Asana / GWS /
  Atlassian) + lookarounds. Onboarded vs not-onboarded state-machine
  branches a single template (`home_not_onboarded.html`); the install
  steps, "Setup a new Claude Code" CTA (90-day PAT mint), and per-
  connector setup prompts hide once `users.onboarded=TRUE`. A
  completion badge replaces them.
- "Mark me as offboarded" button reverses the flag without an SQL UPDATE.
- `users.onboarded BOOLEAN` column added; default FALSE; flipped by the
  CLI's `agnes init` post-success POST and the `/admin/users` API.
- Connector setup prompts pre-check whether the tool is already
  installed/connected before re-running setup.
- GWS scope set widened to include Google Chat (`chat.spaces`,
  `chat.messages`).

**Single template + design tokens**
- `dashboard.html` now extends `base.html` via the new
  `{% block layout %}` opt-out (full-width pages skip the 800px
  `.container`). Net: every page shares one shell.
- `style-custom.css` `:root` extended with `--space-{7,9,10,12}`,
  `--radius-2xl`, `--shadow-{card,elevated}`, `--text-{muted,disabled}`,
  `--focus-ring`, `--transition-*`, `--width-{narrow,app,wide}` so
  inline page styles can migrate incrementally.

**Auth redirects honor AGNES_HOME_ROUTE**
- `safe_next_path` resolves the configured home route when no `default=`
  is passed; OAuth callbacks, magic-link clicks, password form, and
  LOCAL_DEV_MODE shortcuts now land on `/home` (or whatever the operator
  picked) instead of always /dashboard.

**News section + /news permalink + /admin/news editor**
- Schema-bumped `news_template` table (single versioned entity, draft +
  publish gate). `published BOOLEAN` distinguishes draft from public;
  monotonically-increasing `version` per save; rows >30d pruned on
  save except the currently-displayed published version.
- `/home` bottom-of-page renders the latest published intro with a
  "Read more →" link to `/news` (which renders the full body).
- `/admin/news` editor with sandboxed live preview, versions table,
  per-row Unpublish, Format-help cheatsheet.
- `agnes admin news show / draft / edit / publish / unpublish /
  versions / export` (CLI). Talks to the live server via the
  `/api/admin/news/*` endpoints (PAT-authed) — no direct DB access
  so it coexists with a running uvicorn.
- **Optimistic-lock guard**: `agnes admin news publish --version N` and
  PUT/PATCH endpoints accept `expected_version` and 409 with structured
  `{error: "version_conflict", expected, actual, actual_by}` when a
  concurrent admin replaced the draft. Edit refuses to overwrite a
  draft authored by someone else without `--force` or
  `--expect-version`.
- nh3 (Rust-backed ammonia) HTML sanitizer; iframe pre-pass strips
  any iframe whose src is not on the YouTube/Vimeo/Loom allowlist;
  javascript:/data: schemes blocked everywhere.
- Author CSS vocabulary: `.news-hero` (blue gradient hero block),
  `.callout`/`.callout-{info,warn,success,danger}`,
  `.video-embed`, `.news-section`, `.news-grid-{2,3}`, `.news-cta` —
  all consolidated in `style-custom.css` under "News content
  vocabulary (shared)" so /home perex, /news body, and /admin/news
  preview share one source of styling.
- Code-inside-`<pre>` contrast fix (was unreadable amber-on-silver).
- `.news-content` table styling (border, header band, row-hover).

**`scripts/dev/run-local.sh`** — local uvicorn launcher. Pulls Google
OAuth client id/secret from GCP Secret Manager
(`AGNES_OAUTH_GCP_PROJECT`-driven, no vendor defaults), points
`AGNES_CLI_DIST_DIR` at `./dist` so the wheel endpoint resolves, and
`--dev` flips `LOCAL_DEV_MODE=1` + `AGNES_HOME_ROUTE=/home` for one-
command iteration. `LOCAL_DEV_MODE=1` also enables the FastAPI debug
toolbar.

**CLAUDE.md "Run tests before every push" section** codifies
`pytest tests/ -n auto -q` as non-negotiable before each push.

**Tests**: 51 + 14 + 8 = 73 new tests across news-template repo,
sanitizer, API, web, CLI; plus updated home/auth/template tests for
the new shared-shell architecture.

Origin docs (gitignored, customer-fork content):
docs/brainstorms/home-page-requirements.md,
docs/plans/2026-05-07-001-feat-home-page-plan.md.

* feat(cli): agnes onboarded {on,off,status} — self-scoped flag toggle

User-facing equivalent of the in-page "Mark me as (off)boarded" button
on /home. POSTs /api/me/onboarded with {onboarded, source}; --source
overrides the audit-log marker so flips made from the CLI vs the web
button vs agnes init automation stay distinguishable.

`status` reads via /api/me/profile (when present); falls back to a
quick body-marker scan of /home so the read path doesn't write an
audit_log row. PAT-authed via cli.client.api_post — same convention
as agnes admin news / agnes admin add-user etc.

Tests: 5 covering on/off/status round-trip, idempotency, and
audit-log source recording. Full suite holds at 12 pre-existing
failures (same set as before).

* ui(nav+home): primary nav reorg + green What's new band + /marketplace link fix

Primary nav (post-rebase audit + per-user feedback):

- Items: Home → Marketplace → Data Packages → Memory. Admin dropdown
  for admins only. The "Dashboard" label was renamed Home — point still
  resolves through `home_route` so customer instances on /dashboard
  still land there.
- Activity Center moved into the Admin dropdown. Per-team adoption
  analytics is admin-consumed in practice; the route still allows
  any authed user for direct deep-links so existing /home tile +
  bookmarks keep working.
- Memory link added (→ /corporate-memory) — was previously buried in
  the /home "Look around" tiles.
- Setup local agent + My Stack dropped from main nav. Setup is the
  /home install flow's home now; My Stack lives as a tab inside
  /marketplace.

/home tweaks:

- Plugin marketplace tile now points at /marketplace (was /store —
  legacy from before the marketplace rebrand landed in #230).
- "What's new" section header gets a green band (success-flavored
  D1FAE5 background, A7F3D0 border, darker green title) so the
  bottom-of-page news block visibly distinguishes from the blue
  install-hero at the top. Header strip only — body stays white.

Test fix: test_home_route_resolution renamed `dashboard_link_uses_home_route`
→ `home_link_uses_home_route` and asserts `href="/home">Home` instead
of `href="/home">Dashboard` after the label change.

* fix(home): decouple Step 3 + Connect-tools collapse from server onboarded flag

The server-side `users.onboarded` flip happens through two paths:

1. Explicit user click on "Mark me as onboarded" or `agnes onboarded on`.
2. Implicit `agnes init` POST → /api/me/onboarded on success.

Path 2 produced a UX surprise: an analyst running `agnes init` mid-flow
reloaded /home and saw Step 3 (auto-mode) + Connect-your-tools auto-
collapse to summary bars. They were actively working through those
sections — the install POST never signalled "I'm done with the rest
of setup", just "Agnes itself is installed".

Decouple the section-collapse decision from the server flag:

- Step 1 + Step 2 install blocks: still hidden on `onboarded=TRUE`
  (their completion is a hard server signal — Agnes IS installed).
- Step 3 + Connect-your-tools: render flat by default in BOTH states.
  Wrapped in `<details class="setup-collapsible" open>` so the
  browser's native disclosure handles per-section toggle without JS,
  but the `<summary>` is CSS-hidden until the page-level
  `data-setup-minimized="1"` attribute is set on `.home-mock`.
- New "Minimize setup view" toggle inside the blue install-hero,
  rendered only when onboarded. Click flips the data-attr on
  `.home-mock` AND removes the `open` attribute from each
  `<details>`. State persists in `localStorage["agnes_home_setup_minimized"]`
  so the choice survives reloads but is per-device.
- "Show full setup view" (the same button when minimized) re-opens
  both `<details>` and clears localStorage.

When minimized, each `<details>` still has its own native expand/
collapse — click the gray summary bar to peek at one section without
toggling the page-level minimize off.

Tests:
- test_step3_and_connectors_render_flat_when_onboarded_by_default —
  asserts `<details class="setup-collapsible" ... open>` for both
  sections post-onboarding and the absence of any server-rendered
  `data-setup-minimized` attribute on the `.home-mock` root.
- test_minimize_toggle_visible_only_when_onboarded — toggle button
  rendered only when onboarded.

Full pytest holds at 12 pre-existing failures (same set).
2026-05-08 18:28:47 +02:00

286 lines
12 KiB
Python

"""Instance configuration — loads instance.yaml and exposes to FastAPI."""
import logging
import os
from pathlib import Path
from typing import Any, Optional
logger = logging.getLogger(__name__)
_instance_config: Optional[dict] = None
def reset_cache() -> None:
"""Drop the in-process instance.yaml cache; the next ``load_instance_config``
call re-reads from disk. Used by `/api/admin/server-config` after a save.
Public alias so callers don't have to reach into the private global.
Also clears ``connectors.bigquery.access.get_bq_access`` so the v2 endpoints
pick up new BigQuery project IDs after an admin saves `instance.yaml` —
without this, `get_bq_access`'s `@functools.cache` would freeze the projects
at first call and require a container restart to pick up changes (Devin
ANALYSIS_0004 on PR #138). Lazy-imported so this module stays usable in
environments where the connectors package can't be imported (e.g. unit
tests of instance_config in isolation)."""
global _instance_config
_instance_config = None
try:
from connectors.bigquery.access import get_bq_access
get_bq_access.cache_clear()
except Exception:
# Connectors module not loaded yet, or BQ deps missing — both fine.
pass
def _deep_merge(base: dict, patch: dict) -> dict:
"""Deep-merge `patch` into `base`, returning a new dict.
Dict-into-dict recurses; everything else (scalars, lists, None) is
replaced wholesale. Used so the writable overlay can hold only the
sections an operator has touched, while everything else flows from
the static file unchanged. Same semantics as the helper in
`/api/admin/server-config`'s POST handler.
"""
out = dict(base)
for key, value in patch.items():
if isinstance(value, dict) and isinstance(out.get(key), dict):
out[key] = _deep_merge(out[key], value)
else:
out[key] = value
return out
def load_instance_config() -> dict:
"""Load instance.yaml as a deep-merge of the static file and the
writable overlay.
Resolution:
1. Static base: ``CONFIG_DIR/instance.yaml`` via ``config.loader``
(the source of truth for sections the editor doesn't expose —
``datasets``, ``corporate_memory``, ``openmetadata``, etc.).
2. Overlay patch: ``DATA_DIR/state/instance.yaml`` (written by
``/api/admin/configure`` and ``/api/admin/server-config``;
contains only the sections those endpoints accept).
3. Overlay wins per-leaf via deep-merge — operator edits persist,
static-only sections still flow through.
Pre-2026-04-28 this function returned the overlay verbatim when it
existed and only fell back to static when it didn't. That was a
silent footgun: the moment someone saved any section through the
new editor (which writes a narrow overlay by design), every
consumer of static-only sections (corporate memory page, dataset
list, OpenMetadata client) saw empty defaults. See PR #107.
"""
global _instance_config
if _instance_config is not None:
return _instance_config
import yaml
# Static base — strict validation lives in config.loader.
base: dict = {}
try:
from config.loader import load_instance_config as _load
base = _load() or {}
logger.info("Loaded instance.yaml base from config/")
except Exception as e:
logger.warning(f"Could not load static instance.yaml: {e}")
# Overlay patch from the writable volume. Best-effort — a corrupt
# overlay shouldn't take the app offline (we'd rather serve stale/base
# config than 500 every request), but log loudly with a traceback so
# the corruption surfaces in the operator's logs immediately. The
# write-side endpoints (POST /api/admin/server-config and /configure)
# refuse to overwrite a corrupt overlay with HTTP 500, so an admin
# noticing the saves break is the second line of defence.
#
# ${ENV_VAR} interpolation: ``config.loader.load_instance_config`` runs
# the static base through ``_resolve_env_refs`` already, but raw
# ``yaml.safe_load`` here would leave overlay strings like
# ``${ANTHROPIC_API_KEY}`` as literal placeholders. /api/admin/configure
# writes exactly that string into the seeded ai: block (#176), so we
# mirror the resolver here before the deep-merge — without it, the
# LLM factory receives the literal placeholder and rejects it as an
# invalid api key (#179 review fix).
# Resolve via _state_dir() so the path matches the writer in
# app/api/admin.py — under the flat-mount layout (STATE_DIR=/data-state)
# both the configure-endpoint and the server-config-endpoint write
# ``/data-state/instance.yaml``; reading from ``/data/state/...`` here
# would silently load stale config from the regenerable data disk.
from app.secrets import _state_dir
overlay_path = _state_dir() / "instance.yaml"
if overlay_path.exists():
try:
overlay = yaml.safe_load(overlay_path.read_text()) or {}
from config.loader import _resolve_env_refs
overlay = _resolve_env_refs(overlay)
base = _deep_merge(base, overlay)
logger.info("Merged overlay from %s", overlay_path)
except Exception:
logger.exception(
"instance.yaml overlay at %s is corrupt — falling back to "
"static base config; saves through the editor will refuse "
"until the file is repaired", overlay_path,
)
_instance_config = base
return _instance_config
def get_value(*keys, default=None) -> Any:
"""Get nested value from instance config."""
config = load_instance_config()
current = config
for key in keys:
if isinstance(current, dict):
current = current.get(key)
else:
return default
if current is None:
return default
return current
def get_data_source_type() -> str:
return os.environ.get("DATA_SOURCE", get_value("data_source", "type", default="local"))
def get_home_route() -> str:
"""Path that ``/`` redirects to for an authenticated user.
Resolution order: ``AGNES_HOME_ROUTE`` env var (Terraform-friendly,
overrides everything) > ``instance.home_route`` in instance.yaml >
default ``/dashboard``. The env-overrides-yaml shape mirrors
:func:`get_data_source_type` (precedent in this file) so operators
can flip a fork to ``/home`` per-deployment without forking the
YAML.
Validated to start with ``/`` and not ``//`` so a misconfigured
value can't pivot the root redirect to an external host.
"""
raw = os.environ.get("AGNES_HOME_ROUTE") or get_value(
"instance", "home_route", default="/dashboard"
)
route = (raw or "").strip()
if not route.startswith("/") or route.startswith("//"):
return "/dashboard"
return route
def get_gws_oauth_credentials() -> dict:
"""Pre-configured Google Workspace CLI OAuth client (client_id + secret).
When set, /home renders a connector prompt that tells the analyst (and
Claude) to export `GOOGLE_WORKSPACE_CLI_CLIENT_ID` and
`GOOGLE_WORKSPACE_CLI_CLIENT_SECRET` and skip the "create your own GCP
project" walkthrough — the operator has already provisioned a shared
OAuth app for the instance. When unset, the prompt falls back to the
manual `gws auth setup` flow.
OAuth client_id + secret here are app identifiers for an installed
"Desktop app" OAuth client, not a per-user secret. They're rendered
into the public /home page on purpose — they identify the OAuth app,
and the redirect-URI / scope guardrails on the GCP-side OAuth client
are what enforce safety. Treat them like a publishable bundle ID,
not a credential.
Resolution order (env-overrides-yaml, mirrors :func:`get_home_route`):
- ``AGNES_GWS_CLIENT_ID`` env > ``instance.gws.client_id`` YAML > None
- ``AGNES_GWS_CLIENT_SECRET`` env > ``instance.gws.client_secret`` YAML > None
- ``AGNES_GWS_OAUTHLIB_INSECURE_TRANSPORT`` env > ``instance.gws.oauthlib_insecure_transport`` YAML > "1"
(kept as "1" by default because the gws CLI binds an HTTP loopback
on 127.0.0.1:8080 for the OAuth redirect, and Google's oauthlib
refuses non-HTTPS redirects without this flag).
Both id and secret must be set for the configured branch to engage;
a half-configured instance falls back to manual setup with a warning.
"""
cid = os.environ.get("AGNES_GWS_CLIENT_ID") or get_value(
"instance", "gws", "client_id", default=""
)
secret = os.environ.get("AGNES_GWS_CLIENT_SECRET") or get_value(
"instance", "gws", "client_secret", default=""
)
insecure = os.environ.get("AGNES_GWS_OAUTHLIB_INSECURE_TRANSPORT") or get_value(
"instance", "gws", "oauthlib_insecure_transport", default="1"
)
project_id = os.environ.get("AGNES_GWS_PROJECT_ID") or get_value(
"instance", "gws", "project_id", default=""
)
cid = (cid or "").strip()
secret = (secret or "").strip()
project_id = (project_id or "").strip()
# Derive project_id from the client_id when not explicitly set. Google's
# OAuth client_id format is "<numeric-project-number>-<random>.apps.
# googleusercontent.com"; the numeric prefix is required by the
# client_secret.json schema (gws CLI's Rust struct treats it as
# non-Option). Falls back to "" when the client_id is empty or
# malformed; the configured branch in the template degrades gracefully.
if not project_id and cid and "-" in cid:
project_id = cid.split("-", 1)[0]
return {
"client_id": cid,
"client_secret": secret,
"project_id": project_id,
"oauthlib_insecure_transport": str(insecure).strip() or "1",
"configured": bool(cid and secret),
}
def get_home_automode_visibility() -> bool:
"""Whether /home renders the "Step 3 — turn on auto-accept mode"
install-block. Auto-accept mode is the recommended middle ground
between default per-action prompting (slow) and full YOLO
(`--dangerously-skip-permissions`, broad blast radius).
Cautious-rollout instances can hide the section by setting
``AGNES_HOME_SHOW_AUTOMODE=0`` so users learn the permission flow
first; the same content stays available on /setup-advanced.
Resolution: env var > ``instance.home.show_automode`` YAML > True.
Mirrors :func:`get_home_route` shape so Terraform overrides work
the same way.
"""
raw = os.environ.get("AGNES_HOME_SHOW_AUTOMODE")
if raw is None:
raw = get_value("instance", "home", "show_automode", default=True)
if isinstance(raw, bool):
return raw
return str(raw).strip().lower() not in ("0", "false", "no", "off", "")
def get_instance_name() -> str:
return get_value("instance", "name", default="AI Data Analyst")
def get_instance_subtitle() -> str:
return get_value("instance", "subtitle", default="")
def get_sync_interval() -> str:
"""Human-readable refresh cadence shown in the analyst welcome prompt."""
return get_value("instance", "sync_interval", default="1 hour")
def get_allowed_domains() -> list:
domain = get_value("auth", "allowed_domain", default="")
if domain:
return [d.strip() for d in domain.split(",") if d.strip()]
return []
def get_datasets() -> dict:
return get_value("datasets", default={})
def get_theme() -> dict:
return get_value("theme", default={})
def get_auth_config() -> dict:
return get_value("auth", default={})
def get_corporate_memory_config() -> dict:
return get_value("corporate_memory", default={})