Commit graph

329 commits

Author SHA1 Message Date
ZdenekSrotyr
bca5e91826 feat: add bootstrap endpoint + deploy skill for AI agents
- POST /auth/bootstrap — creates first admin, self-deactivates after
- da setup bootstrap — CLI command for agent-driven setup
- da setup verify — structured health check (JSON output for agents)
- cli/skills/deploy.md — complete deployment guide for AI agents
- 6 bootstrap tests including full agent deployment flow simulation
- 156 total tests passing
2026-03-30 14:01:01 +02:00
ZdenekSrotyr
a74f69d6b1 chore: exclude CI workflow from push (needs workflow scope) 2026-03-27 17:41:27 +01:00
ZdenekSrotyr
0b91d4ac47 feat: complete web UI + auth providers + template compatibility
All 7 web pages rendering (200):
  /login, /dashboard, /catalog, /corporate-memory,
  /corporate-memory/admin, /activity-center, /admin/tables

All 13 API endpoints working (200):
  health, sync, data, query, users, memory, scripts,
  settings, telegram, admin, catalog

Auth providers: Google OAuth, Password (argon2), Email magic link
Cookie-based JWT auth for web UI after OAuth redirect
FlexDict for Flask→FastAPI template compatibility
150 tests passing
2026-03-27 17:34:39 +01:00
ZdenekSrotyr
1a7939c594 feat: add auth providers (Google OAuth, Password, Email magic link) + web UI fixes
- Google OAuth with authlib + auto user creation + cookie-based JWT
- Password auth with argon2 hash + setup token flow
- Email magic link with SMTP/SendGrid support
- Cookie-based auth for web UI (after OAuth redirect)
- Dashboard template compatibility (user_info, activity, desktop status)
- 150 tests passing
2026-03-27 17:07:59 +01:00
ZdenekSrotyr
fb1e60d8e1 fix: fix TemplateResponse API for Starlette compatibility
Use new TemplateResponse(request, name, context) signature.
Add Flask compat shims (get_flashed_messages, url_for, session).
2026-03-27 16:59:04 +01:00
ZdenekSrotyr
1287e63ed9 feat: complete system — web UI, all API endpoints, governance, admin, CLI commands
Major additions:
- Web UI: Jinja2 templates in FastAPI (login, dashboard, catalog, corporate memory, admin)
- API: catalog profiles/metrics, telegram verify/unlink/status, admin table registry CRUD
- Corporate memory governance: approve/reject/mandate/revoke/edit/batch + audit log
- Sync: real DataSyncManager trigger, sync-settings, table-subscriptions
- CLI: setup (init/test/deploy/verify), server (logs/restart/deploy/backup), explore
- Instance config integration (instance.yaml loaded at startup)
- 140 tests passing (25 new)
2026-03-27 16:52:22 +01:00
ZdenekSrotyr
c5527ec153 fix: harden script sandbox and SQL query security
Fixes found by E2E QA agent:
- Script sandbox: block os, sys, socket, eval, exec, open, __import__,
  getattr, pathlib and 20+ other dangerous patterns
- SQL query: block COPY, ATTACH, read_csv, semicolons, non-SELECT
- Added 24 security tests covering all attack vectors
2026-03-27 16:11:05 +01:00
ZdenekSrotyr
07b396bfe2 docs: add refactoring plan, design spec, and gitignore updates 2026-03-27 15:42:57 +01:00
ZdenekSrotyr
e0ce91ddb9 feat: add dataset permissions, script execution, Kamal config, CI/CD
- SyncSettingsRepository + DatasetPermissionRepository with RBAC
- Script deploy/run/undeploy API with import sandboxing
- User sync settings API with permission checks
- 4 CLI skills (connectors, security, notifications, corporate-memory)
- Kamal production + staging configs
- GitHub Actions CI + deploy workflows
- 91 total tests passing
2026-03-27 15:40:11 +01:00
ZdenekSrotyr
3701130a11 feat: add Docker, CLI tool, scheduler, and agent skills
- Dockerfile (uv-based) + docker-compose.yml (3 services)
- CLI tool 'da' with commands: auth, sync, query, status, admin, diagnose, skills
- Scheduler sidecar service (replaces systemd timers)
- pyproject.toml for uv distribution
- Built-in skills (setup, troubleshoot) for AI agents
- 17 CLI tests, 75 total tests passing
2026-03-27 15:30:03 +01:00
ZdenekSrotyr
a3918d3833 feat: add FastAPI server with auth, RBAC, and all API endpoints
- JWT auth with role-based access control (viewer/analyst/admin/km_admin)
- Endpoints: health, sync manifest, data download, query, users CRUD,
  corporate memory, session/artifact upload
- 18 API tests covering auth, RBAC, all endpoints
2026-03-27 15:19:18 +01:00
ZdenekSrotyr
64acc8d731 feat: add JSON to DuckDB migration script with tests 2026-03-27 15:09:06 +01:00
ZdenekSrotyr
79b0b66f2e feat: add DuckDB state layer with all repository classes
- src/db.py: schema with 14 tables matching design spec
- 7 repository classes: SyncState, Users, Knowledge, Audit,
  Telegram, PendingCode, Script, TableRegistry, Profiles
- 37 tests covering all CRUD operations
2026-03-27 15:06:55 +01:00
ZdenekSrotyr
f76411c603 feat: add DuckDB state layer with schema management
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 13:55:54 +01:00
Petr
eb7e5bdf8f Add data freshness indicators and remote table visibility to UI
- Fix sync_state.json parsing: derive last_updated from table last_sync
  timestamps when root-level field is missing (flat format support)
- Parse ALL YAML blocks from data_description.md (was only first block)
- Show remote tables (daily_deal_traffic) in catalog with "Live" badge
- Show per-table sync timestamps and Local/Live query mode badges
- Add data freshness note to Business Metrics section
- Dashboard: fix "Not yet synced" bug, show local/live table breakdown
2026-03-25 16:24:26 +01:00
Petr
a667b4e32f Fix profiler crash for remote-only tables without primary_key
Same issue as config.py - profiler's TableInfo and parser required
primary_key and sync_strategy, breaking auto-profile after sync
when daily_deal_traffic (remote-only) is in config.
2026-03-25 14:47:00 +01:00
Petr
4ebb3fc7b2 Fix data sync crash: make primary_key and sync_strategy optional
Remote-only tables (query_mode="remote") like daily_deal_traffic
don't need primary_key or sync_strategy. The parser used hard
lookups (table_data["primary_key"]) causing KeyError and breaking
all data sync since 2026-03-21.

Changes:
- TableConfig: default primary_key="" and sync_strategy="none"
- Parser: use .get() with defaults instead of [] lookups
- Validator: add "none" as valid sync_strategy
2026-03-25 14:43:22 +01:00
Petr
74ecf66f80 Increase knowledge item content limit from 500 to 1000 chars 2026-03-24 00:12:15 +01:00
Petr
0560bbc127 Rename Mandate button to Make Mandatory 2026-03-23 19:44:08 +01:00
Petr
e85d296b0a Add Corporate Memory admin review queue UI (Phase 2)
Admin page at /corporate-memory/admin with three tabs:
- Review Queue: pending items with approve/mandate/reject + batch ops
- All Items: status filter, promote/demote/revoke actions
- Audit Log: filterable action history table

Features:
- Keyboard shortcuts (j/k navigate, a/r/m = approve/reject/mandate)
- Inline mandate form (mandatory reason + audience targeting)
- Toast notifications on action success/error
- Pending count badge on main Corporate Memory page
- Matches existing visual design (CSS variables, card styles)
2026-03-23 19:32:33 +01:00
Petr
1318b74ff1 Add Corporate Memory governance — Phase 1 (data model + admin API)
Add admin curation layer between AI extraction and knowledge distribution.
Admins (km_admin flag in instance.yaml) can approve, reject, mandate, and
revoke knowledge items. Mandatory items distribute to all targeted users
automatically.

Three governance modes (configurable per instance):
- mandatory_only: admin controls everything, no user voting
- admin_curated: admin controls, users vote as feedback signal
- hybrid: mandatory from admin + optional from user voting

Three approval workflows:
- review_queue: nothing published without admin approval
- auto_publish: items go live immediately, admin intervenes retroactively
- threshold: confidence-based auto-publish (Phase 5)

Includes:
- 9 admin action functions (approve/reject/mandate/revoke/edit/batch/...)
- 11 new admin API endpoints under /api/corporate-memory/admin/
- Immutable audit log (audit.jsonl)
- Audience targeting via groups
- Automatic migration of existing items to "approved" status
- km_admin_required auth decorator
- 69 tests covering all governance logic
- Backward compatible: no config = legacy wiki behavior
2026-03-23 19:15:33 +01:00
Petr
c04791b702 Suppress httpcore debug logging in LLM connector 2026-03-23 12:57:35 +01:00
Petr
f619fadc42 Fix SSL verification and suppress OpenAI SDK debug logging
- Add verify_ssl config option for corporate proxies with self-signed certs
- Suppress openai/httpx debug loggers that dump full request bodies
  (including prompt content) — security requirement
2026-03-23 12:56:04 +01:00
Petr
95358448e6 Add modular LLM connector for Corporate Memory
Replace hardwired Anthropic API calls with a pluggable provider system.
Each deployment configures its AI provider in instance.yaml — switching
between Anthropic, LiteLLM, OpenRouter, or any OpenAI-compatible proxy
is a config change, not a code change.

New connectors/llm/ module:
- StructuredExtractor Protocol with extract_json() interface
- AnthropicExtractor: direct Anthropic SDK with retry + backoff
- OpenAICompatExtractor: any OpenAI-compatible proxy with three-layer
  structured output fallback (json_schema -> json_object -> prompt)
- Configurable structured_output policy (strict/json/auto)
- Custom exception hierarchy (auth/rate_limit/timeout/format/refusal)
- Zero secrets in logs: no API keys, prompts, or responses logged

Reviewed by: Google Gemini, Claude Sonnet, OpenAI GPT-5.4.
Security audit passed with all critical findings resolved.
2026-03-23 12:08:33 +01:00
Petr
84d14da611 Fix remote query UX: file-based stdin, ssh permissions, deprecation
Session testing revealed 3 issues with remote queries:

1. CLAUDE.md template recommended `cat <<HEREDOC | ssh ...` but
   claude_settings.json had `cat` in deny list, causing 2-3 failed
   attempts per query. Replaced with file-based approach: Write tool
   creates JSON file, then `ssh ... < file` avoids the cat deny.

2. ssh/scp commands were not in the allow list, requiring manual
   approval for every remote query. Added both to allow list.

3. DuckDB fetch_arrow_table() emitted DeprecationWarning on every
   parquet export. Replaced with .arrow().read_all().

Also added instruction for proactive hybrid analysis when remote
tables are available (agent was only using local data until asked).
2026-03-21 18:41:43 +01:00
Petr
8c6c162417 Fix: --sql not required when --stdin is used
argparse was rejecting --stdin mode because --sql was required=True.
Changed to required=False with runtime validation in main().
2026-03-21 12:17:02 +01:00
Petr
67df4acd73 Add --stdin JSON mode to avoid shell escaping nightmare
Agent was failing 3x on SSH commands due to backticks (BQ table names)
and single quotes (SQL string literals) getting mangled by nested shell
interpretation (local -> SSH -> bash -> Python).

New --stdin mode reads query spec as JSON from stdin via heredoc:
  cat <<'QUERY' | ssh alias 'bash remote_query.sh --stdin'
  {"register_bq": {"alias": "SELECT ... FROM \`table\` ..."}, "sql": "..."}
  QUERY

Heredoc with <<'QUERY' (quoted) passes everything literally -- no
escaping needed for backticks, quotes, or parentheses.

Updated claude_md_template.txt to use --stdin as the primary method.
2026-03-21 12:15:50 +01:00
Petr
39763ea5a2 Fix: load instance.yaml without requiring webapp secrets
Analysts don't have WEBAPP_SECRET_KEY, so load_instance_config()
validation failed with noisy warnings. Now reads instance.yaml
directly with yaml.safe_load, skipping secret validation.
2026-03-21 12:01:41 +01:00
Petr
dfec39722b Fix remote_query.sh: use analyst-readable env file
GCP OS Login doesn't honor /etc/group changes for SSH sessions,
so analyst can't read /opt/data-analyst/.env even after usermod.

Wrapper now reads .remote_query.env from scripts dir (dataread group),
falls back to .env for admin users. The env file contains only
non-secret BQ config (project ID, location, data dir).
2026-03-21 11:59:57 +01:00
Petr
dce8454894 Add remote_query.sh wrapper, fix analyst SSH permissions
Analyst user (foundry_e_psimecek) couldn't access /opt/data-analyst/.
Added to data-ops group on server.

New scripts/remote_query.sh wrapper handles env setup (PYTHONPATH,
CONFIG_DIR, .env) so agents use simple:
  ssh alias 'bash ~/server/scripts/remote_query.sh --sql "..." --format table'

Updated claude_md_template.txt to use wrapper instead of raw commands.
2026-03-21 11:58:04 +01:00
Petr
ed5a5ec706 Fix: duckdb_manager CONFIG_DIR support for server deployment
find_project_root() and parse_data_description() now check CONFIG_DIR
env var first when looking for data_description.md. On server deployment,
data_description.md lives in instance/config/ (CONFIG_DIR), not in the
OSS repo's docs/ directory.
2026-03-21 11:40:55 +01:00
Petr
d180b2014e Step 28: Remote query architecture for local+remote table JOINs
Add src/remote_query.py CLI module enabling the AI agent to run SQL
queries spanning local Parquet tables and remote BigQuery tables in a
single DuckDB session on the server. Two-phase protocol: BQ sub-queries
(--register-bq) fetch filtered/aggregated data, then DuckDB SQL (--sql)
joins everything.

Safety: COUNT(*) pre-check, memory estimation (2GB cap), row limits
(500K per BQ sub-query, 100K final result).

Changes:
- New src/remote_query.py with CLI, BQ registration, output formatting
- Add bq_entity_type field to TableConfig (view vs table routing)
- Extract create_local_views() from duckdb_manager.py for reuse
- Update claude_md_template.txt with remote query agent instructions
- Update example configs with remote_query section and docs
- 52 new tests (42 remote_query + 10 bq_entity_type), all passing
2026-03-21 11:39:15 +01:00
Petr
ed16122994 Use data_product config for metric discovery instead of filter_tag in webapp 2026-03-18 16:10:15 +01:00
Petr
e63c8747b5 Fix metric expression extraction: use 'code' field
OpenMetadata stores SQL in metricExpression.code, not .expression.
This caused all metric expressions to export as empty strings.
2026-03-18 13:01:23 +01:00
Petr
908d1f2247 Fix search_by_data_product: client-side filtering
OpenMetadata search API ignores queryFilter for dataProducts field.
Use type-specific index + client-side filtering by dataProducts
membership instead. Correctly returns 16/32 metrics for FoundryAI.
2026-03-18 12:54:59 +01:00
Petr
fb63a72a98 Add data product discovery, fix remove-analyst script
- client.py: add search_by_data_product() for OpenMetadata search API
- catalog_export.py: prefer data product discovery over tag filtering
  (finds all 16 metrics in FoundryAIDataModel vs 3 with tag filter)
- remove-analyst: fix GROUPS bash variable collision, improve messaging
2026-03-18 12:52:41 +01:00
Petr
ab99f0af92 Fix sync_schedule validation to accept multi-time daily format
The scheduler.py already supported "daily HH:MM,HH:MM,HH:MM" format
(commit 5f27d05), but config.py validation regex only accepted single
time "daily HH:MM", causing data-refresh to crash on startup.

Also adds:
- tests/test_config_sync_schedule.py (16 test cases)
- Makefile with validate-config target for CI/CD integration
2026-03-17 13:21:14 +01:00
Petr
5f27d05894 Support multiple daily sync times (e.g., "daily 07:00,13:00,18:00")
Scheduler now accepts comma-separated HH:MM times in daily schedules.
Each time slot is independently evaluated - if any slot has passed and
last_sync is before it, the table is marked as due.

This lets tables sync multiple times per day to pick up data refreshes
that happen throughout the day (e.g., Keboola pipelines running 3x/day).
2026-03-16 23:09:48 +01:00
Petr
f19ff10e1a Fix: don't update last_sync when partitioned sync gets 0 new rows
When BQ returns empty results (e.g., data not yet refreshed), the
scheduler was marking sync as complete for the day. This meant the
next 15-min tick would skip it ("none are due") and data would stay
stale until the next day's scheduled run.

Now: if partitioned sync processes partitions but gets 0 new rows,
last_sync is NOT updated. The scheduler will retry on the next tick
(15 min later) when data may be available.
2026-03-16 23:01:35 +01:00
Petr
6c0abf275b Add cache busting to metric_modal.css include 2026-03-16 22:16:37 +01:00
Petr
9be22fdc82 Fix metric display: use displayName in list, render HTML in modal
List view:
- Show display_name ("M1 + VFM Operational") instead of name ("M1PlusVFMOperational")
- Strip HTML and truncate description for clean list excerpts

Modal detail:
- Render original HTML from catalog instead of stripped plain text
- Add .om-description CSS class for structured HTML (bold labels, lists, code)
- Pass description_html alongside plain text description for backwards compat
2026-03-16 22:11:58 +01:00
Petr
ad525a96aa Filter catalog metrics by configurable tag (e.g., AIAgent.FoundryAI)
Add filter_tag support to catalog_export and webapp so only metrics
with the required tag are exported to YAML and displayed in UI.
Previously all 19+ metrics were exported regardless of relevance.

- Add has_tag() helper to transformer module
- catalog_export.py: filter_tag parameter from instance.yaml openmetadata config
- webapp/app.py: filter metrics in _load_metrics_from_catalog()
- 7 new tests (has_tag, filter_tag export, stale cleanup)
2026-03-16 22:03:53 +01:00
Petr
440662c8fe Fix remove-analyst silent failure caused by set -e + pipefail
The script was exiting silently on the GROUPS=$(groups ... | cut ...)
line — set -eo pipefail caused bash to terminate the script before any
echo output, making it appear to do nothing.

Replace set -euo pipefail with set -u and explicit error handling.
Admin scripts must always report what happened, never exit silently.

Also: use id -nG instead of groups|cut pipe, add verification step
after userdel, and log each operation for visibility.
2026-03-15 14:17:39 +01:00
Petr
2181d490e9 Fix systemd NAMESPACE failures caused by missing ReadWritePaths dirs
data-refresh.service: use /tmp instead of /tmp/data_analyst_staging in
ReadWritePaths — the subdirectory may not exist at service start, causing
mount namespace setup to fail before any Exec* directive runs.

deploy.sh: fix typo services/corporate-memory -> services/corporate_memory
so the mkdir conditional actually matches the repo directory name.

deploy.sh: add ReadWritePaths validation loop that auto-creates any missing
directories listed in installed .service files before daemon-reload. This
acts as a safety net against future NAMESPACE failures from new services.
2026-03-15 11:40:11 +01:00
Petr
80c5b902e0 Add scheduled data sync and catalog refresh with systemd timers
- New sync_schedule and profile_after_sync fields in TableConfig
  (formats: "every 15m", "every 1h", "daily 05:00")
- New src/scheduler.py with schedule evaluation logic (is_table_due)
- New --scheduled mode in data_sync.py: only syncs tables that are due,
  respects profile_after_sync flag, auto-restarts webapp after profiling
- Systemd timer+service for data-refresh (every 15 min)
- Systemd timer+service for catalog-refresh (every 15 min)
- deploy.sh enables new timers automatically
- Complete table config reference in data_description.md.example
- 58 new scheduler tests
2026-03-15 02:16:31 +01:00
Petr
d9f3977028 URL-encode FQN in catalog header links (spaces -> %20) 2026-03-15 02:06:22 +01:00
Petr
60039c0af3 Add direct catalog URL to YAML header (metric/table entity links)
Source line now links directly to the entity in OpenMetadata:
- metrics: https://datacatalog.../metric/UniqueVisitors
- tables:  https://datacatalog.../table/bigquery.project.dataset.table
2026-03-15 02:03:27 +01:00
Petr
ab1a93ed67 Strip HTML tags from OpenMetadata descriptions in YAML export
OpenMetadata stores descriptions as rich HTML (<p>, <strong>, &nbsp;, etc.).
Add strip_html() to transformer that converts to clean plain text for YAML
files consumed by Claude Code agent. Applied to metric descriptions, table
descriptions, and column descriptions. Webapp display dict keeps raw HTML
since the modal renders it correctly.
2026-03-15 01:57:04 +01:00
Petr
985f47cdb7 Add catalog export: generate YAML metrics and tables from OpenMetadata
- New `connectors/openmetadata/transformer.py` with shared parsing logic
  for extracting categories, grain, dimensions, expressions from OM tags
- New `src/catalog_export.py` script (python -m src.catalog_export) that
  fetches metrics/tables from OpenMetadata API and writes YAML files to
  /data/docs/metrics/ and /data/docs/tables/ for agent consumption
- Refactor webapp/app.py to delegate to transformer (with inline fallback)
- Add `fields` parameter to client.get_metrics() and get_metric_by_fqn()
  for fetching tags+owners in a single API call
- Fix pre-existing mock bug in test_openmetadata_enricher (base_url)
- 101 new tests (80 transformer + 21 export), all passing
2026-03-15 01:15:30 +01:00
Petr
e17dd85504 Remove hardcoded Jira/Keboola references from sync_data.sh
- Silent fallback when no sync settings exist (no 'Jira disabled' message)
- Generic dataset exclude/include loop driven by sync_settings.yaml
- Generic cleanup loop for disabled datasets
- Replaces 100+ lines of hardcoded Jira/kbc_telemetry_expert blocks
2026-03-15 01:02:37 +01:00