Phase 1 - Internal reference cleanup:
- Delete dev_docs/meetings/ (internal meeting notes/transcripts)
- Replace hardcoded usernames (padak/matejkys/dasa) with deploy/generic
- Replace "Internal AI Data Analyst" with "AI Data Analyst"
- Replace keboola/internal_ai_data_analyst URLs with your-org/ai-data-analyst
- Replace /tmp/keboola_load/ with /tmp/data_analyst_staging/ in dev_docs
Phase 2 - Deployment hardening:
- Tighten sudoers wildcards to explicit paths (visudo, sudoers cp)
- setup.sh creates all groups (data-ops, dataread, data-private) and deploy user
- webapp-setup.sh copies sudoers-webapp from repo instead of inline definition
- deploy.sh conditional copy for data_description.md (not in git for OSS)
- deploy.sh ownership changed to deploy:data-ops for /data/{scripts,docs,examples}
Phase 3 - Config and misc:
- Add ${ENV_VAR} interpolation to config/loader.py
- Expand config/instance.yaml.example with all sections (admins, deployment, auth, etc.)
- Create config/.env.template for secret values
- Add MIT LICENSE
- Fix .gitignore: add .venv/, docs/data_description.md
- Fix README.md: CSV status Planned, remove metrics/, update license text
- Translate Czech comments in requirements.txt to English
- Fix test_account_service.py: mock username mapping instead of relying on instance config
All 118 tests pass.
54 lines
1.6 KiB
Text
54 lines
1.6 KiB
Text
# Data source adapters (install only what you need)
|
|
kbcstorage>=0.9.0 # For Keboola adapter
|
|
|
|
# Data processing
|
|
# pandas - core tabular data processing library
|
|
# pyarrow - Parquet format support and fast operations
|
|
# pytz - timezone support required by DuckDB for reading timezone-aware Parquet columns
|
|
pandas>=2.0.0
|
|
pyarrow>=12.0.0
|
|
pytz>=2024.1
|
|
|
|
# Analytical database
|
|
# DuckDB - in-process SQL OLAP database for analytical queries
|
|
duckdb>=0.9.0
|
|
|
|
# Configuration
|
|
# python-dotenv - loading environment variables from .env files
|
|
# pyyaml - parsing YAML configuration from data_description.md
|
|
python-dotenv>=1.0.0
|
|
pyyaml>=6.0
|
|
|
|
# Progress tracking and logging
|
|
# tqdm - progress bars for long-running operations (download, sync)
|
|
tqdm>=4.65.0
|
|
|
|
# Web application (Google SSO portal)
|
|
# flask - web framework for self-service portal
|
|
# authlib - OAuth 2.0 / OpenID Connect library for Google SSO
|
|
# gunicorn - WSGI server for production deployment
|
|
flask>=3.0.0
|
|
authlib>=1.3.0
|
|
gunicorn>=21.0.0
|
|
|
|
# Telegram notification bot
|
|
# httpx - async HTTP client for Telegram API and unix socket communication
|
|
# aiohttp - async HTTP server for bot's internal send API
|
|
httpx>=0.27.0
|
|
aiohttp>=3.9.0
|
|
matplotlib>=3.8.0
|
|
numpy>=1.24.0
|
|
|
|
# Desktop app authentication
|
|
# PyJWT - JWT token creation and validation for desktop app auth
|
|
PyJWT>=2.8.0
|
|
|
|
# Password authentication for external users
|
|
# argon2-cffi - modern password hashing algorithm (Argon2id)
|
|
# sendgrid - email service for setup/reset links
|
|
argon2-cffi>=23.1.0
|
|
sendgrid>=6.11.0
|
|
|
|
# Corporate Memory knowledge extraction
|
|
# anthropic - Claude API client for HAIKU-based knowledge extraction
|
|
anthropic>=0.39.0
|