trading-platform/orchestration/tareas/TASK-2026-01-25-ML-DATA-MIGRATION/05-EJECUCION.md
Adrian Flores Cortes c4d1524793 [TASK-2026-01-25-ML-DATA-MIGRATION] docs: Add CAPVED documentation for ML data migration task
- Created full CAPVED folder with METADATA, 01-06 phases, and SUMMARY
- Updated _INDEX.yml with new task entry
- Documents: Polygon data loading, MySQL→PostgreSQL migration, 12 attention models

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 06:17:40 -06:00

131 lines
3.2 KiB
Markdown

# 05-EJECUCION - ML Data Migration & Model Training
## Fecha: 2026-01-25
## Fase: EJECUCION (E)
## Estado: COMPLETADA
---
## 1. Ambiente Python
### 1.1 Creacion de Virtual Environment
```bash
# Crear venv en Linux home (evitar cross-filesystem)
wsl -d Ubuntu-24.04 -u developer -- python3 -m venv ~/venvs/data-service
# Instalar dependencias
wsl -d Ubuntu-24.04 -u developer -- ~/venvs/data-service/bin/pip install \
aiohttp asyncpg pandas numpy python-dotenv structlog
```
### 1.2 Dependencias ML
```bash
wsl -d Ubuntu-24.04 -u developer -- ~/venvs/data-service/bin/pip install \
xgboost scikit-learn joblib sqlalchemy pyyaml loguru psycopg2-binary
```
---
## 2. Carga de Datos desde Polygon
### 2.1 Script Creado: `apps/data-service/scripts/fetch_polygon_data.py`
Funcionalidades:
- Async con aiohttp para requests a Polygon API
- Rate limiting (5 req/min)
- Batch inserts con asyncpg
- ON CONFLICT handling para upserts
- Normalizacion de timezones
### 2.2 Ejecucion
```bash
cd /mnt/c/Empresas/ISEM/workspace-v2/projects/trading-platform/apps/data-service
~/venvs/data-service/bin/python scripts/fetch_polygon_data.py
```
### 2.3 Resultado
- Tiempo total: ~2 horas (rate limit)
- Bars cargados: 469,217
- Sin errores
---
## 3. Migracion ML Engine a PostgreSQL
### 3.1 Archivos Creados
**`apps/ml-engine/src/data/database.py`** (356 lineas)
- `PostgreSQLConnection` class
- Metodos: `get_ticker_data()`, `execute_query()`, `get_all_tickers()`
- Traduccion automatica MySQL→PostgreSQL
- Alias `MySQLConnection` para compatibilidad
**`apps/ml-engine/src/data/__init__.py`**
- Exports: DatabaseManager, PostgreSQLConnection, load_ohlcv_from_postgres
### 3.2 Configuracion Actualizada
**`apps/ml-engine/config/database.yaml`**
```yaml
postgres:
host: localhost
port: 5432
database: trading_platform
user: trading_user
password: trading_dev_2026
mysql:
_deprecated: true
```
**`apps/ml-engine/.env`**
```
DB_HOST=localhost
DB_PORT=5432
DB_NAME=trading_platform
DB_USER=trading_user
DB_PASSWORD=trading_dev_2026
```
---
## 4. Entrenamiento de Modelos
### 4.1 Ejecucion
```bash
cd /mnt/c/Empresas/ISEM/workspace-v2/projects/trading-platform/apps/ml-engine
~/venvs/data-service/bin/python -m training.train_attention_models
```
### 4.2 Resultado
- 12 modelos entrenados (6 symbols x 2 timeframes)
- Cada modelo: regressor + classifier + metadata
- Reporte: `ATTENTION_TRAINING_REPORT_20260125_060911.md`
---
## 5. Commits Realizados
| Repo | Hash | Mensaje |
|------|------|---------|
| trading-platform | ffee190 | docs: Update DATABASE/ML_INVENTORY |
| ml-engine-v2 | 475e913 | config: Update database.yaml |
| data-service-v2 | 0e20c7c | feat: Add Polygon fetch script |
| workspace-v2 | 9b9ca7b0 | chore: Update submodules |
---
## 6. Problemas Resueltos
### 6.1 PEP 668 Restriction
- **Error:** "externally-managed-environment"
- **Solucion:** Usar venv en lugar de pip global
### 6.2 Cross-Filesystem Venv
- **Error:** venv en /mnt/c no funcionaba correctamente
- **Solucion:** Crear venv en ~/venvs/ (Linux nativo)
### 6.3 Timezone Comparison
- **Error:** "can't compare offset-naive and offset-aware datetimes"
- **Solucion:** `.replace(tzinfo=None)` en timestamps de PostgreSQL