agnes-the-ai-analyst

Author	SHA1	Message	Date
ZdenekSrotyr	a65de8574e	feat: add import_from_yaml and export_to_yaml to MetricRepository Adds YAML-based bulk import/export to MetricRepository, supporting list-wrapped and plain-dict YAML formats, table→table_name field mapping, and sql_by_* → sql_variants collection (and reverse on export). All 24 tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 19:25:11 +02:00
ZdenekSrotyr	88d536ca29	feat: add MetricRepository with full CRUD and search for metric_definitions Implements MetricRepository following the table_registry pattern — raw SQL, dict returns, ON CONFLICT upsert, and json.dumps for sql_variants/validation. Includes 18 tests covering create, read, list, update, delete, find_by_table, find_by_synonym, and get_table_map.	2026-04-10 19:21:25 +02:00
ZdenekSrotyr	cc1445f7ed	fix: use SCHEMA_VERSION constant in v3-to-v4 migration test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 19:18:57 +02:00
ZdenekSrotyr	bc394bd266	feat: schema migration v3→v4 with metric_definitions and column_metadata tables Add SCHEMA_VERSION = 4, _V3_TO_V4_MIGRATIONS list, and if current < 4 block in _ensure_schema(). Both new tables are also added to _SYSTEM_SCHEMA for fresh installs. Tests cover fresh install, all columns, and v3→v4 migration path.	2026-04-10 19:14:32 +02:00
ZdenekSrotyr	6c53082295	feat: multi-instance deployment — all 14 must-have items from spec CalVer CI (release.yml) with stable/dev channels, health endpoint with version/channel/schema_version, JWT secret auto-generation with file persistence, smoke test script + Docker-in-CI, pre-migration snapshot, /api/admin/configure for headless setup, /api/admin/ discover-and-register, /setup wizard, OpenAPI snapshot test, custom connector mount support, CHANGELOG, migration safety tests, startup banner. 663 tests pass (6 new migration safety + 3 OpenAPI snapshot + 1 updated JWT test).	2026-04-10 11:57:42 +02:00
ZdenekSrotyr	cf59abe6dd	fix: update tests to provide password after OAuth token bypass fix	2026-04-09 16:35:15 +02:00
ZdenekSrotyr	2043594670	fix: restrict script execution endpoints to analyst/admin roles deploy, run, and run-deployed require analyst; undeploy requires admin. Update test to use admin token for undeploy.	2026-04-09 16:31:42 +02:00
ZdenekSrotyr	449053bf8a	fix: enforce per-table access control on catalog profile endpoints Add can_access_table check to GET /api/catalog/profile/{table_name} and POST /api/catalog/profile/{table_name}/refresh, returning 403 for unauthorized tables. Update test_api_complete to cover new 403 behaviour and fix the existing 404 test to use admin token.	2026-04-09 16:30:24 +02:00
ZdenekSrotyr	ad6b3a96e4	fix: enforce role guards on admin web pages Add require_role(Role.ADMIN) to /admin/tables and /admin/permissions, and require_role(Role.KM_ADMIN) to /corporate-memory/admin so that non-admin users receive 403 instead of being served the page. Fix admin_cookie test fixture to supply a password_hash (required since the /auth/token endpoint blocks passwordless requests). Add analyst fixture and TestAdminRoleGuards tests verifying analysts get 403 and admins get 200 on the protected routes.	2026-04-09 16:30:13 +02:00
ZdenekSrotyr	3205a8d300	fix: block /auth/token for OAuth-only users without password_hash Users without a password_hash (Google OAuth / magic-link accounts) could obtain a JWT by simply posting their email to /auth/token. Add an else clause that rejects such requests with 401, directing them to their configured auth provider. Update and extend tests accordingly.	2026-04-09 16:29:47 +02:00
ZdenekSrotyr	55515266ea	fix: block DuckDB metadata functions and relative paths in query endpoint Add information_schema, duckdb_* introspection functions, pragma_* functions, and relative path traversal patterns to the SQL blocklist so users cannot enumerate schema metadata regardless of RBAC. Add six corresponding tests.	2026-04-09 16:29:11 +02:00
ZdenekSrotyr	afa84f6585	fix: web UI smoke tests — reset DuckDB singleton, get token via API	2026-04-09 07:18:17 +02:00
ZdenekSrotyr	5131816a5b	test: add missing coverage for web UI, Jira extract, instance config, and concurrent rebuild - tests/test_web_ui.py: smoke tests for all authenticated web pages (login, dashboard, catalog, corporate-memory, activity-center, admin/tables, admin/permissions) - tests/test_jira_service.py: unit tests for extract_init and update_meta in the Jira connector - tests/test_instance_config.py: verifies get_instance_name() returns a string when config file is absent - tests/test_orchestrator.py: concurrent rebuild test asserting rebuild succeeds while a read-only connection holds the analytics DB	2026-04-09 07:15:14 +02:00
ZdenekSrotyr	8df8183a9f	feat: add 50 MB upload size limit to session and artifact endpoints Rejects files exceeding MAX_UPLOAD_SIZE with HTTP 413 before writing to disk.	2026-04-09 07:14:16 +02:00
ZdenekSrotyr	8e9a0c367a	fix: replace os.environ direct assignment with monkeypatch.setenv in test fixtures Prevents environment variable leaking between tests. All DATA_DIR, JWT_SECRET_KEY, and SCRIPT_TIMEOUT assignments in fixtures now use monkeypatch.setenv() which auto-reverts after each test. Removes manual os.environ.pop() cleanup lines.	2026-04-09 07:11:36 +02:00
ZdenekSrotyr	53a9e838f9	feat: add graceful shutdown handler - Add close_system_db() function in src/db.py to cleanly close shared DB connection - Add lifespan context manager in app/main.py to trigger shutdown on app exit - Integrate lifespan into FastAPI app initialization - All API tests pass (77/77)	2026-04-09 07:03:45 +02:00
ZdenekSrotyr	1b3acce7e9	fix: replace substring table access check with word-boundary regex Replace substring matching with word-boundary regex in query endpoint's table access validation. Prevents false positives where short table names like 'id' would block any query containing the word. Uses re.escape() to safely handle special characters in table names. - Import re module at top - Use regex pattern with word boundaries (\b) for matching - Add tests to verify no false positives and proper blocking	2026-04-09 07:00:48 +02:00
ZdenekSrotyr	3e9c347cf1	fix: validate extract dir name in get_analytics_db_readonly to prevent SQL injection Adds _SAFE_IDENTIFIER regex guard before ATTACHing extract.duckdb files in the read-only analytics connection, matching the same fix already applied in the orchestrator. Adds test coverage for malicious directory names.	2026-04-09 06:57:31 +02:00
ZdenekSrotyr	e425d4baa5	fix: handle WAL files in atomic swap to prevent DB corruption Add _atomic_swap_db helper that removes stale WAL files before and after moving the temp DuckDB into place. Apply CHECKPOINT before close in both orchestrator and Keboola extractor so DuckDB flushes WAL before the swap.	2026-04-09 06:57:29 +02:00
ZdenekSrotyr	3321d2e266	security: reduce JWT expiry to 24h and add jti claim Tokens previously lasted 30 days with no revocation path. Expiry is now 24 hours and every token carries a unique jti (UUID hex) to support future revocation checks.	2026-04-09 06:57:23 +02:00
ZdenekSrotyr	23ae6a602c	security: harden query endpoint SQL blocklist and disable external access Expand blocked keywords to cover parquet_scan, read_csv_auto, query_table, iceberg_scan, delta_scan, call, URL schemes (http/https/s3/gcs), and additional file-scan functions. Set enable_external_access=false on the non-read-only analytics connection path. Add three new tests covering parquet_scan, read_csv_auto, and query_table blocking.	2026-04-09 06:54:58 +02:00
ZdenekSrotyr	4aa97c23d2	fix: raise RuntimeError on missing JWT_SECRET_KEY in non-test environments Prevents production deployments from silently using a hardcoded default secret. TESTING=1 still resolves to a built-in test key so the existing test suite is unaffected. Adds a test that verifies the RuntimeError is raised when neither JWT_SECRET_KEY nor TESTING is set.	2026-04-09 06:54:29 +02:00
ZdenekSrotyr	0d3ab5060c	fix: reject unsafe SQL identifiers in orchestrator Adds _validate_identifier() with ^[a-zA-Z_][a-zA-Z0-9_]{0,63}$ regex and applies it to source_name (directory names), table_name (_meta rows), and alias/extension (_remote_attach rows) before any SQL interpolation. Adds two tests covering SQL-injection directory names and malicious _meta entries.	2026-04-09 06:51:07 +02:00
ZdenekSrotyr	cb9c566d07	fix: rebuild_source delegates to full rebuild to preserve all source views _do_rebuild_source was creating a fresh temp DB with only one source, then atomically replacing analytics.duckdb — wiping views from every other source. Now it delegates to _do_rebuild so all extract dirs are re-attached in a single pass. Adds test_rebuild_source_preserves_other_sources to guard the regression.	2026-04-09 06:48:25 +02:00
ZdenekSrotyr	94c6b0f839	fix: require password verification when user has password_hash in /auth/token Previously the password check was gated on both user.password_hash and request.password being truthy, so an attacker could omit the password field (which defaults to "") and receive a valid JWT. Now any user with a stored hash must supply a non-empty password that passes argon2 verification. Adds six TestTokenEndpoint tests covering empty, missing, wrong, and correct password, plus no-hash user and unknown user cases.	2026-04-09 06:44:31 +02:00
ZdenekSrotyr	3ba207a7f8	feat: add _remote_attach to BigQuery extractor, support token-less ATTACH in orchestrator BigQuery extension handles auth via GOOGLE_APPLICATION_CREDENTIALS env var, so _remote_attach uses empty token_env. Orchestrator now supports both token-based (Keboola) and env-based (BigQuery) authentication modes.	2026-04-08 18:13:31 +02:00
ZdenekSrotyr	06e1cf0a8d	feat: generic _remote_attach contract for remote DuckDB extension views Extractors with remote tables now write a _remote_attach table into extract.duckdb so the orchestrator can re-ATTACH external extensions at query time. The mechanism is source-agnostic — any connector can use it. - Keboola extractor writes _remote_attach + creates views on kbc.* - Orchestrator reads _remote_attach, installs extension, reads token from env - Graceful degradation: missing token → warning, local tables still work	2026-04-08 18:10:12 +02:00
ZdenekSrotyr	05a1b452e9	security: harden query (read-only DB), uploads (path sanitization), scripts (AST validation)	2026-04-08 12:09:19 +02:00
ZdenekSrotyr	67a1e0bb45	feat: Jira webhook FastAPI adapter — replaces Flask Blueprint	2026-04-08 07:04:50 +02:00
ZdenekSrotyr	4d1acd014a	refactor: remove legacy webapp + add missing tests + housekeeping Phase A: Close fixed issues (#7, #8, #9), add server/ user/ to .gitignore, increase extractor timeout to 30 min. Phase B: Add 10 new tests — access request lifecycle (4), CLI admin commands (5), sync subprocess trigger (1). 578 tests passing. Phase C: Delete entire webapp/ directory (24,800 lines) — legacy Flask app fully replaced by FastAPI app/. Fix auth providers to use app.instance_config instead of webapp.config. Update CLAUDE.md. Delete 6 webapp-only test files. Fix Jira service config imports.	2026-03-31 13:44:06 +02:00
ZdenekSrotyr	1074d5ec49	feat: implement data access control — table-level permissions Schema v3: add is_public column to table_registry (default true). src/rbac.py: can_access_table() checks admin bypass, public flag, explicit permissions, wildcard bucket permissions. API enforcement: - manifest: filters tables by user access - download: 403 if no access - catalog: filters table list - query: validates referenced tables against allowed list New admin permissions API (/api/admin/permissions) for grant/revoke. 28 access control tests + 733 total tests passing.	2026-03-31 12:33:31 +02:00
ZdenekSrotyr	617e724d21	feat: add E2E test suite — API, extractor, Docker tests/conftest.py: shared fixtures (e2e_env, seeded_app, create_mock_extract) tests/test_e2e_api.py: 11 tests — full sync flow, RBAC, table lifecycle tests/test_e2e_extract.py: 6 tests — Keboola/BQ/Jira pipelines, multi-source, corrupt handling tests/test_e2e_docker.py: 3 tests — Docker health + full flow (opt-in via -m docker) Fix admin update route (duplicate id kwarg, .dict() → .model_dump()). 705 tests passing.	2026-03-31 08:18:54 +02:00
ZdenekSrotyr	b0eaef88cc	refactor: delete old server infra — 4,200 lines removed Remove all legacy deployment infrastructure replaced by Docker + Kamal: - server/ directory (deploy.sh, setup.sh, webapp-setup.sh, sudoers, nginx config, systemd units, bin scripts) - scripts/sync_data.sh (replaced by da sync + API) - All services/*/systemd/ files (replaced by docker-compose) - tests/test_deploy_guard.py and tests/test_sync_data.py 688 tests passing.	2026-03-31 08:06:41 +02:00
ZdenekSrotyr	caa60a507d	feat: add centralized RBAC module — replace Linux group auth New src/rbac.py: Role enum, hierarchy, get_user_role(), has_role(), is_admin(), is_km_admin(), has_dataset_access(), set_user_role(). webapp/auth.py: admin_required + km_admin_required now use DuckDB roles instead of Linux groups (pwd.getpwnam + sudo/data-ops check). app/auth/dependencies.py: imports Role from src/rbac.py (single source). 11 RBAC tests passing.	2026-03-31 08:04:35 +02:00
ZdenekSrotyr	b502bd8bdd	refactor: delete old sync pipeline — 9,500 lines removed Phase 5 cleanup: remove all code replaced by extract.duckdb architecture. Deleted modules: - src/config.py (653) — replaced by DuckDB table_registry - src/parquet_manager.py (755) — replaced by DuckDB COPY TO - src/data_sync.py (734) — replaced by SyncOrchestrator - src/remote_query.py (636) — replaced by DuckDB BigQuery ATTACH - src/table_registry.py (464) — replaced by DuckDB repository - connectors/keboola/adapter.py (820) — replaced by extractor.py - connectors/bigquery/adapter.py (665) — replaced by extractor.py - connectors/bigquery/client.py (644) — replaced by DuckDB BQ extension Updated all imports in webapp, catalog_export, enricher, router, sync_settings_service, generate_sample_data. Kept keboola/client.py as fallback (removed src.config dependency). 704 tests passing.	2026-03-31 07:50:37 +02:00
ZdenekSrotyr	9f20529f10	fix: resolve 7 preexisting test failures - Remove iCloud duplicate files (test_db 2.py, src/db 2.py) - Fix metrics expression fallback to top-level field in transformer + webapp - Fix sync_data.sh rsync exception pattern for $SSH_HOST variable - Fix deploy_guard cp regex to skip shell variable expansions - Update sudoers-deploy with missing root:data-ops rules - Update CRITICAL_DIRS ownership expectations to match deploy.sh reality 913 tests passing, 0 failures.	2026-03-30 20:36:00 +02:00
ZdenekSrotyr	18e5f0b6e8	feat: implement extract.duckdb contract — orchestrator + extractors Phase 0: extend table_registry schema (v1→v2 migration), add source_type/bucket/source_table/query_mode columns. Phase 1: SyncOrchestrator ATTACHes extract.duckdb files into master analytics.duckdb. Keboola extractor uses DuckDB extension with legacy client fallback. BigQuery extractor is remote-only via DuckDB BQ extension (no data download). 62 tests passing.	2026-03-30 20:12:56 +02:00
ZdenekSrotyr	bca5e91826	feat: add bootstrap endpoint + deploy skill for AI agents - POST /auth/bootstrap — creates first admin, self-deactivates after - da setup bootstrap — CLI command for agent-driven setup - da setup verify — structured health check (JSON output for agents) - cli/skills/deploy.md — complete deployment guide for AI agents - 6 bootstrap tests including full agent deployment flow simulation - 156 total tests passing	2026-03-30 14:01:01 +02:00
ZdenekSrotyr	1a7939c594	feat: add auth providers (Google OAuth, Password, Email magic link) + web UI fixes - Google OAuth with authlib + auto user creation + cookie-based JWT - Password auth with argon2 hash + setup token flow - Email magic link with SMTP/SendGrid support - Cookie-based auth for web UI (after OAuth redirect) - Dashboard template compatibility (user_info, activity, desktop status) - 150 tests passing	2026-03-27 17:07:59 +01:00
ZdenekSrotyr	1287e63ed9	feat: complete system — web UI, all API endpoints, governance, admin, CLI commands Major additions: - Web UI: Jinja2 templates in FastAPI (login, dashboard, catalog, corporate memory, admin) - API: catalog profiles/metrics, telegram verify/unlink/status, admin table registry CRUD - Corporate memory governance: approve/reject/mandate/revoke/edit/batch + audit log - Sync: real DataSyncManager trigger, sync-settings, table-subscriptions - CLI: setup (init/test/deploy/verify), server (logs/restart/deploy/backup), explore - Instance config integration (instance.yaml loaded at startup) - 140 tests passing (25 new)	2026-03-27 16:52:22 +01:00
ZdenekSrotyr	c5527ec153	fix: harden script sandbox and SQL query security Fixes found by E2E QA agent: - Script sandbox: block os, sys, socket, eval, exec, open, __import__, getattr, pathlib and 20+ other dangerous patterns - SQL query: block COPY, ATTACH, read_csv, semicolons, non-SELECT - Added 24 security tests covering all attack vectors	2026-03-27 16:11:05 +01:00
ZdenekSrotyr	e0ce91ddb9	feat: add dataset permissions, script execution, Kamal config, CI/CD - SyncSettingsRepository + DatasetPermissionRepository with RBAC - Script deploy/run/undeploy API with import sandboxing - User sync settings API with permission checks - 4 CLI skills (connectors, security, notifications, corporate-memory) - Kamal production + staging configs - GitHub Actions CI + deploy workflows - 91 total tests passing	2026-03-27 15:40:11 +01:00
ZdenekSrotyr	3701130a11	feat: add Docker, CLI tool, scheduler, and agent skills - Dockerfile (uv-based) + docker-compose.yml (3 services) - CLI tool 'da' with commands: auth, sync, query, status, admin, diagnose, skills - Scheduler sidecar service (replaces systemd timers) - pyproject.toml for uv distribution - Built-in skills (setup, troubleshoot) for AI agents - 17 CLI tests, 75 total tests passing	2026-03-27 15:30:03 +01:00
ZdenekSrotyr	a3918d3833	feat: add FastAPI server with auth, RBAC, and all API endpoints - JWT auth with role-based access control (viewer/analyst/admin/km_admin) - Endpoints: health, sync manifest, data download, query, users CRUD, corporate memory, session/artifact upload - 18 API tests covering auth, RBAC, all endpoints	2026-03-27 15:19:18 +01:00
ZdenekSrotyr	64acc8d731	feat: add JSON to DuckDB migration script with tests	2026-03-27 15:09:06 +01:00
ZdenekSrotyr	79b0b66f2e	feat: add DuckDB state layer with all repository classes - src/db.py: schema with 14 tables matching design spec - 7 repository classes: SyncState, Users, Knowledge, Audit, Telegram, PendingCode, Script, TableRegistry, Profiles - 37 tests covering all CRUD operations	2026-03-27 15:06:55 +01:00
ZdenekSrotyr	f76411c603	feat: add DuckDB state layer with schema management Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 13:55:54 +01:00
Petr	1318b74ff1	Add Corporate Memory governance — Phase 1 (data model + admin API) Add admin curation layer between AI extraction and knowledge distribution. Admins (km_admin flag in instance.yaml) can approve, reject, mandate, and revoke knowledge items. Mandatory items distribute to all targeted users automatically. Three governance modes (configurable per instance): - mandatory_only: admin controls everything, no user voting - admin_curated: admin controls, users vote as feedback signal - hybrid: mandatory from admin + optional from user voting Three approval workflows: - review_queue: nothing published without admin approval - auto_publish: items go live immediately, admin intervenes retroactively - threshold: confidence-based auto-publish (Phase 5) Includes: - 9 admin action functions (approve/reject/mandate/revoke/edit/batch/...) - 11 new admin API endpoints under /api/corporate-memory/admin/ - Immutable audit log (audit.jsonl) - Audience targeting via groups - Automatic migration of existing items to "approved" status - km_admin_required auth decorator - 69 tests covering all governance logic - Backward compatible: no config = legacy wiki behavior	2026-03-23 19:15:33 +01:00
Petr	95358448e6	Add modular LLM connector for Corporate Memory Replace hardwired Anthropic API calls with a pluggable provider system. Each deployment configures its AI provider in instance.yaml — switching between Anthropic, LiteLLM, OpenRouter, or any OpenAI-compatible proxy is a config change, not a code change. New connectors/llm/ module: - StructuredExtractor Protocol with extract_json() interface - AnthropicExtractor: direct Anthropic SDK with retry + backoff - OpenAICompatExtractor: any OpenAI-compatible proxy with three-layer structured output fallback (json_schema -> json_object -> prompt) - Configurable structured_output policy (strict/json/auto) - Custom exception hierarchy (auth/rate_limit/timeout/format/refusal) - Zero secrets in logs: no API keys, prompts, or responses logged Reviewed by: Google Gemini, Claude Sonnet, OpenAI GPT-5.4. Security audit passed with all critical findings resolved.	2026-03-23 12:08:33 +01:00
Petr	8c6c162417	Fix: --sql not required when --stdin is used argparse was rejecting --stdin mode because --sql was required=True. Changed to required=False with runtime validation in main().	2026-03-21 12:17:02 +01:00

1 2

73 commits