Commit graph

26 commits

Author SHA1 Message Date
ZdenekSrotyr
1a55167234 docs: workflow-driven VM recreate for startup-script propagation
- ONBOARDING.md: replace 'propagating module changes' section with two
  explicit options — workflow_dispatch with recreate_targets (recommended,
  CI audit trail), or local terraform apply -replace (emergency). Adds a
  'do not' section banning manual .env edits on VMs.
- deployment-log.md: iteration 4 summary (version badge + module v1.5.0 +
  workflow_dispatch).
2026-04-21 20:24:31 +02:00
ZdenekSrotyr
cdd959b19f docs(log): add iteration 3 — review, bootstrap fix, docs sweep, infra-v1.4.0 2026-04-21 20:09:13 +02:00
ZdenekSrotyr
6470e23df3 docs: finalize deployment log — iteration 2 summary 2026-04-21 19:11:07 +02:00
ZdenekSrotyr
03dd81c825 docs: update deployment log with final state and onboarding workflow
- Volume fix documented (Docker named volume → bind mount /data)
- Watchtower → cron-based auto-upgrade
- Final state snapshot of VMs, repos, tags, secrets
- Onboarding flow summary for 2nd customer
2026-04-21 16:51:20 +02:00
ZdenekSrotyr
e53de59a42 docs: multi-customer deployment spec + implementation plan
- Spec: pure self-deploy model with per-customer GCP project
- Public upstream repo with TF module; private template + per-customer repos
- Branch-aware dev VMs via dev_instances list
- Caddy TLS, Secret Manager for tokens, SA JSON key for CI (WIF follow-up)
- 6-phase implementation plan with bite-sized tasks
2026-04-21 15:25:17 +02:00
ZdenekSrotyr
bd6921c4d5 docs,tests: anonymize customer references
Replace identifying customer names and infrastructure URLs in
documentation and test fixtures with generic placeholders.
Test semantics preserved.
2026-04-21 11:56:19 +02:00
ZdenekSrotyr
51f60bbf91 docs: add comprehensive test suite implementation plan (8 tasks, 6 parallel blocks)
Covers shared infrastructure, API gaps, CLI gaps, services, connectors,
E2E journeys, Docker and live tests. Tasks 2-7 are independent for
parallel sub-agent dispatch.
2026-04-12 10:44:08 +02:00
ZdenekSrotyr
55d11920ef docs: add comprehensive test strategy spec (6 parallel blocks, 4 layers)
Covers gap analysis, 8 critical E2E journeys, shared test infrastructure,
Docker E2E and live test design for full project coverage.
2026-04-12 10:33:26 +02:00
ZdenekSrotyr
816168f96b docs: add remote query implementation plan (5 tasks)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 11:02:04 +02:00
ZdenekSrotyr
eb68e6292d docs: fix remote query spec after code review
- Address read-only LOAD uncertainty with verification step + workaround
- Clarify register_bq wraps BQ logic (not delegates to register_bq_table)
- Use existing max_bq_registration_rows config key name
- Apply SQL blocklist to both register_bq and final sql
- Define connection lifecycle (caller owns, try/finally)
- Fix CLI argument handling (optional positional + --sql flag)
- Document concurrency safety (Unix inode semantics)
- Handle missing google-cloud-bigquery gracefully

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 10:58:25 +02:00
ZdenekSrotyr
017cf07674 docs: add design spec for remote query (extension re-attach + two-phase BQ)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 10:52:39 +02:00
ZdenekSrotyr
06ac937f8b docs: add implementation plans for porting internal features
Three independent plans following TDD approach:
1. Business metrics (10 tasks) — schema v4, repository, CLI, API, starter pack, profiler integration
2. Analyst bootstrap (4 tasks) — da analyst setup, CLAUDE.md template, freshness check
3. Metadata writer (4 tasks) — column metadata repo, CLI, API, Keboola push

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:08:55 +02:00
ZdenekSrotyr
c57e195932 docs: fix design spec after code review
Addresses all Critical and Important issues found by reviewer:
- Fix schema migration details (_V3_TO_V4_MIGRATIONS, _ensure_schema chain)
- Add YAML-to-DuckDB field mapping table (table→table_name)
- Remove unexplained src/metrics.py from new files
- Fix API endpoint URLs (table/{id} → {table_id}, /api/data/tables → /api/catalog/tables)
- Commit to da analyst as top-level command (not sub-sub-command)
- Fix CLAUDE.local.md path to .claude/CLAUDE.local.md
- Remove duplicate --upload-local flag (--upload-only already exists)
- Detail profiler refactor call sites
- Add metrics API deprecation plan for catalog endpoint
- Use {metric_id:path} for slash-containing IDs
- Add --force flag and resume behavior for bootstrap
- Specify proposals directory path
- Simplify da metrics add to --file import

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 18:58:39 +02:00
ZdenekSrotyr
1ce632bc0b docs: add design spec for porting internal features to OSS
Covers business metrics in DuckDB, analyst bootstrap flow,
and metadata writer — based on comparison with internal repo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 18:49:34 +02:00
ZdenekSrotyr
cce179f114 docs: add versioned tags per channel (dev-YYYY.MM.N, stable-YYYY.MM.N) 2026-04-10 06:44:25 +02:00
ZdenekSrotyr
4ea22232ef docs: multi-instance deployment and versioning design spec 2026-04-09 21:14:21 +02:00
ZdenekSrotyr
53f39bb38d chore: clean stale docs — rewrite architecture.md, remove old plans
- architecture.md rewritten for v2 (FastAPI, DuckDB, Docker) — removed
  all Flask/rsync/SSH/systemd references
- Deleted PLAN.md and REFACTORING_PLAN.md (completed, superseded)
- auto-install.md replaced with redirect to DEPLOYMENT.md
- Fixed absolute paths in superpowers plan doc
2026-04-09 09:06:13 +02:00
ZdenekSrotyr
1b219cabe9 fix: remove dead PRAGMA enable_wal code
DuckDB has used WAL by default since v0.8, so this pragma is not
valid DuckDB syntax. Removed obsolete try-except block that attempted
to enable WAL on system database initialization.
2026-04-09 06:59:57 +02:00
ZdenekSrotyr
89154d043b chore: clean repo for public release — fix references, remove drafts
- Replace padak/tmp_oss → keboola/agnes-the-ai-analyst in all docs, infra, CLI
- Replace your-org/ai-data-analyst → keboola/agnes-the-ai-analyst in README, Jira docs
- Remove real GCP project ID from terraform.tfvars.example
- Delete internal draft documents (dev_docs/draft/)
- Update infra/main.tf to clone from main branch
2026-04-08 19:27:25 +02:00
ZdenekSrotyr
92fbb88c15 chore: Docker prod config (Python 3.13, no reload), fix utcnow deprecation, update docs 2026-04-08 12:10:47 +02:00
ZdenekSrotyr
1074d5ec49 feat: implement data access control — table-level permissions
Schema v3: add is_public column to table_registry (default true).

src/rbac.py: can_access_table() checks admin bypass, public flag,
explicit permissions, wildcard bucket permissions.

API enforcement:
- manifest: filters tables by user access
- download: 403 if no access
- catalog: filters table list
- query: validates referenced tables against allowed list

New admin permissions API (/api/admin/permissions) for grant/revoke.

28 access control tests + 733 total tests passing.
2026-03-31 12:33:31 +02:00
ZdenekSrotyr
18e5f0b6e8 feat: implement extract.duckdb contract — orchestrator + extractors
Phase 0: extend table_registry schema (v1→v2 migration), add
source_type/bucket/source_table/query_mode columns.

Phase 1: SyncOrchestrator ATTACHes extract.duckdb files into master
analytics.duckdb. Keboola extractor uses DuckDB extension with
legacy client fallback. BigQuery extractor is remote-only via
DuckDB BQ extension (no data download).

62 tests passing.
2026-03-30 20:12:56 +02:00
ZdenekSrotyr
0b9720d090 docs: rewrite core refactoring spec v2 — simplified extract.duckdb contract 2026-03-30 19:24:19 +02:00
ZdenekSrotyr
9ee7b3bd09 docs: add core refactoring design spec — DuckDB-centric extract architecture 2026-03-30 18:15:52 +02:00
ZdenekSrotyr
1287e63ed9 feat: complete system — web UI, all API endpoints, governance, admin, CLI commands
Major additions:
- Web UI: Jinja2 templates in FastAPI (login, dashboard, catalog, corporate memory, admin)
- API: catalog profiles/metrics, telegram verify/unlink/status, admin table registry CRUD
- Corporate memory governance: approve/reject/mandate/revoke/edit/batch + audit log
- Sync: real DataSyncManager trigger, sync-settings, table-subscriptions
- CLI: setup (init/test/deploy/verify), server (logs/restart/deploy/backup), explore
- Instance config integration (instance.yaml loaded at startup)
- 140 tests passing (25 new)
2026-03-27 16:52:22 +01:00
ZdenekSrotyr
07b396bfe2 docs: add refactoring plan, design spec, and gitignore updates 2026-03-27 15:42:57 +01:00