agnes-the-ai-analyst/config/loader.py
Petr 26c4e0934d OSS cleanup: remove internal references, harden deployment, add config env interpolation
Phase 1 - Internal reference cleanup:
- Delete dev_docs/meetings/ (internal meeting notes/transcripts)
- Replace hardcoded usernames (padak/matejkys/dasa) with deploy/generic
- Replace "Internal AI Data Analyst" with "AI Data Analyst"
- Replace keboola/internal_ai_data_analyst URLs with your-org/ai-data-analyst
- Replace /tmp/keboola_load/ with /tmp/data_analyst_staging/ in dev_docs

Phase 2 - Deployment hardening:
- Tighten sudoers wildcards to explicit paths (visudo, sudoers cp)
- setup.sh creates all groups (data-ops, dataread, data-private) and deploy user
- webapp-setup.sh copies sudoers-webapp from repo instead of inline definition
- deploy.sh conditional copy for data_description.md (not in git for OSS)
- deploy.sh ownership changed to deploy:data-ops for /data/{scripts,docs,examples}

Phase 3 - Config and misc:
- Add ${ENV_VAR} interpolation to config/loader.py
- Expand config/instance.yaml.example with all sections (admins, deployment, auth, etc.)
- Create config/.env.template for secret values
- Add MIT LICENSE
- Fix .gitignore: add .venv/, docs/data_description.md
- Fix README.md: CSV status Planned, remove metrics/, update license text
- Translate Czech comments in requirements.txt to English
- Fix test_account_service.py: mock username mapping instead of relying on instance config

All 118 tests pass.
2026-03-09 07:59:57 +01:00

133 lines
3.7 KiB
Python

"""
Instance configuration loader.
Loads instance.yaml from CONFIG_DIR (env var) or ./config/ fallback.
Used by both webapp and src modules for instance-specific settings.
Supports ${ENV_VAR} syntax in YAML values for secret interpolation.
Actual secret values are stored in .env (gitignored), while the YAML
structure stays in instance.yaml.
"""
import logging
import os
import re
from pathlib import Path
from typing import Any
import yaml
logger = logging.getLogger(__name__)
CONFIG_DIR = Path(os.environ.get("CONFIG_DIR", "./config"))
_ENV_PATTERN = re.compile(r"\$\{([^}]+)\}")
def _resolve_env_refs(value: Any) -> Any:
"""Resolve ${ENV_VAR} references in config values.
Walks the config tree recursively. String values containing ${VAR}
are replaced with the corresponding environment variable value
(empty string if not set). Non-string values pass through unchanged.
"""
if isinstance(value, str):
def replacer(match: re.Match) -> str:
env_key = match.group(1)
return os.environ.get(env_key, "")
return _ENV_PATTERN.sub(replacer, value)
if isinstance(value, dict):
return {k: _resolve_env_refs(v) for k, v in value.items()}
if isinstance(value, list):
return [_resolve_env_refs(item) for item in value]
return value
def load_instance_config() -> dict[str, Any]:
"""Load instance configuration from instance.yaml.
Search order:
1. $CONFIG_DIR/instance.yaml
2. ./config/instance.yaml
Raises:
FileNotFoundError: If instance.yaml not found.
yaml.YAMLError: If YAML is invalid.
ValueError: If config is empty or missing required fields.
"""
path = CONFIG_DIR / "instance.yaml"
if not path.exists():
# Fallback to local config dir
path = Path("./config/instance.yaml")
if not path.exists():
raise FileNotFoundError(
"Instance configuration not found. "
"Copy config/instance.yaml.example to config/instance.yaml "
"and fill in your values."
)
with open(path) as f:
config = yaml.safe_load(f)
if not config:
raise ValueError("instance.yaml is empty")
config = _resolve_env_refs(config)
_validate_config(config)
logger.info("Instance config loaded from %s", path)
return config
def _validate_config(config: dict) -> None:
"""Validate required configuration fields.
Raises:
ValueError: If required fields are missing or empty.
"""
required_paths = [
("instance", "name"),
("auth", "allowed_domain"),
("server", "host"),
("server", "hostname"),
]
missing = []
for keys in required_paths:
value = config
path_str = ".".join(keys)
for key in keys:
if not isinstance(value, dict) or key not in value:
missing.append(path_str)
break
value = value[key]
else:
if not value:
missing.append(path_str)
if missing:
raise ValueError(
f"Missing required instance config fields: {', '.join(missing)}. "
f"Check config/instance.yaml"
)
def get_instance_value(config: dict, *keys: str, default: Any = None) -> Any:
"""Get a nested value from instance config.
Args:
config: Instance config dict.
*keys: Path of keys (e.g., "instance", "name").
default: Default value if path not found.
Returns:
Config value or default.
"""
value = config
for key in keys:
if not isinstance(value, dict) or key not in value:
return default
value = value[key]
return value if value is not None else default