trading-platform/orchestration/tareas/TASK-2026-01-25-ML-DATA-MIGRATION/05-EJECUCION.md
Adrian Flores Cortes c4d1524793 [TASK-2026-01-25-ML-DATA-MIGRATION] docs: Add CAPVED documentation for ML data migration task
- Created full CAPVED folder with METADATA, 01-06 phases, and SUMMARY
- Updated _INDEX.yml with new task entry
- Documents: Polygon data loading, MySQL→PostgreSQL migration, 12 attention models

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 06:17:40 -06:00

3.2 KiB

05-EJECUCION - ML Data Migration & Model Training

Fecha: 2026-01-25

Fase: EJECUCION (E)

Estado: COMPLETADA


1. Ambiente Python

1.1 Creacion de Virtual Environment

# Crear venv en Linux home (evitar cross-filesystem)
wsl -d Ubuntu-24.04 -u developer -- python3 -m venv ~/venvs/data-service

# Instalar dependencias
wsl -d Ubuntu-24.04 -u developer -- ~/venvs/data-service/bin/pip install \
  aiohttp asyncpg pandas numpy python-dotenv structlog

1.2 Dependencias ML

wsl -d Ubuntu-24.04 -u developer -- ~/venvs/data-service/bin/pip install \
  xgboost scikit-learn joblib sqlalchemy pyyaml loguru psycopg2-binary

2. Carga de Datos desde Polygon

2.1 Script Creado: apps/data-service/scripts/fetch_polygon_data.py

Funcionalidades:

  • Async con aiohttp para requests a Polygon API
  • Rate limiting (5 req/min)
  • Batch inserts con asyncpg
  • ON CONFLICT handling para upserts
  • Normalizacion de timezones

2.2 Ejecucion

cd /mnt/c/Empresas/ISEM/workspace-v2/projects/trading-platform/apps/data-service
~/venvs/data-service/bin/python scripts/fetch_polygon_data.py

2.3 Resultado

  • Tiempo total: ~2 horas (rate limit)
  • Bars cargados: 469,217
  • Sin errores

3. Migracion ML Engine a PostgreSQL

3.1 Archivos Creados

apps/ml-engine/src/data/database.py (356 lineas)

  • PostgreSQLConnection class
  • Metodos: get_ticker_data(), execute_query(), get_all_tickers()
  • Traduccion automatica MySQL→PostgreSQL
  • Alias MySQLConnection para compatibilidad

apps/ml-engine/src/data/__init__.py

  • Exports: DatabaseManager, PostgreSQLConnection, load_ohlcv_from_postgres

3.2 Configuracion Actualizada

apps/ml-engine/config/database.yaml

postgres:
  host: localhost
  port: 5432
  database: trading_platform
  user: trading_user
  password: trading_dev_2026

mysql:
  _deprecated: true

apps/ml-engine/.env

DB_HOST=localhost
DB_PORT=5432
DB_NAME=trading_platform
DB_USER=trading_user
DB_PASSWORD=trading_dev_2026

4. Entrenamiento de Modelos

4.1 Ejecucion

cd /mnt/c/Empresas/ISEM/workspace-v2/projects/trading-platform/apps/ml-engine
~/venvs/data-service/bin/python -m training.train_attention_models

4.2 Resultado

  • 12 modelos entrenados (6 symbols x 2 timeframes)
  • Cada modelo: regressor + classifier + metadata
  • Reporte: ATTENTION_TRAINING_REPORT_20260125_060911.md

5. Commits Realizados

Repo Hash Mensaje
trading-platform ffee190 docs: Update DATABASE/ML_INVENTORY
ml-engine-v2 475e913 config: Update database.yaml
data-service-v2 0e20c7c feat: Add Polygon fetch script
workspace-v2 9b9ca7b0 chore: Update submodules

6. Problemas Resueltos

6.1 PEP 668 Restriction

  • Error: "externally-managed-environment"
  • Solucion: Usar venv en lugar de pip global

6.2 Cross-Filesystem Venv

  • Error: venv en /mnt/c no funcionaba correctamente
  • Solucion: Crear venv en ~/venvs/ (Linux nativo)

6.3 Timezone Comparison

  • Error: "can't compare offset-naive and offset-aware datetimes"
  • Solucion: .replace(tzinfo=None) en timestamps de PostgreSQL