- Added DATA-PIPELINE-SPEC.md for ML signals module - Added TASK-2026-01-25-ML-TRAINING-ENHANCEMENT documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
257 lines
9.5 KiB
Markdown
257 lines
9.5 KiB
Markdown
# 05-EJECUCIÓN: Mejora Integral de Modelos ML para Trading
|
|
|
|
**Task ID:** TASK-2026-01-25-ML-TRAINING-ENHANCEMENT
|
|
**Fase:** E - Ejecución
|
|
**Estado:** Pendiente
|
|
**Fecha:** 2026-01-25
|
|
|
|
---
|
|
|
|
## 1. LOG DE EJECUCIÓN
|
|
|
|
### 1.1 FASE 1: INFRAESTRUCTURA ✅ COMPLETADA
|
|
|
|
#### TASK-1.1: Data Pipeline ✅
|
|
|
|
| Subtarea | Estado | Inicio | Fin | Notas |
|
|
|----------|--------|--------|-----|-------|
|
|
| 1.1.1 Migrar datos MySQL→PostgreSQL | ✅ Completada | 2026-01-25 | 2026-01-25 | Script creado: migrate_historical_data.py |
|
|
| 1.1.2 Implementar data loader | ✅ Completada | 2026-01-25 | 2026-01-25 | training_loader.py (~300 líneas) |
|
|
| 1.1.3 Crear validadores de calidad | ✅ Completada | 2026-01-25 | 2026-01-25 | validators.py (~200 líneas) |
|
|
| 1.1.4 Documentar schema y pipelines | ✅ Completada | 2026-01-25 | 2026-01-25 | DATA-PIPELINE-SPEC.md |
|
|
|
|
#### TASK-1.2: Attention Architecture ✅
|
|
|
|
| Subtarea | Estado | Inicio | Fin | Notas |
|
|
|----------|--------|--------|-----|-------|
|
|
| 1.2.1 Implementar Price-Focused Attention | ✅ Completada | 2026-01-25 | 2026-01-25 | price_attention.py (~400 líneas) |
|
|
| 1.2.2 Implementar Positional Encoding | ✅ Completada | 2026-01-25 | 2026-01-25 | positional_encoding.py (~300 líneas) |
|
|
| 1.2.3 Crear extractor de attention scores | ✅ Completada | 2026-01-25 | 2026-01-25 | attention_extractor.py (~500 líneas) |
|
|
| 1.2.4 Tests unitarios de attention | ✅ Completada | 2026-01-25 | 2026-01-25 | test_attention_architecture.py (37 tests) |
|
|
|
|
---
|
|
|
|
### 1.2 FASE 2: ESTRATEGIAS (Paralelo) ✅ COMPLETADA
|
|
|
|
#### TASK-2.1: Strategy PVA ✅
|
|
|
|
| Subtarea | Estado | Agente | Notas |
|
|
|----------|--------|--------|-------|
|
|
| 2.1.1 Feature engineering retornos | ✅ | general-purpose | feature_engineering.py (~700 líneas) |
|
|
| 2.1.2 Transformer Encoder | ✅ | general-purpose | Usa PriceFocusedAttention existente |
|
|
| 2.1.3 XGBoost prediction head | ✅ | general-purpose | model.py (~920 líneas) |
|
|
| 2.1.4 Entrenar por activo | ✅ | general-purpose | trainer.py (~790 líneas) |
|
|
| 2.1.5 Walk-forward validation | ✅ | general-purpose | Incluido en trainer |
|
|
| 2.1.6 Documentación | ✅ | general-purpose | __init__.py con docstrings |
|
|
|
|
#### TASK-2.2: Strategy MRD ✅
|
|
|
|
| Subtarea | Estado | Agente | Notas |
|
|
|----------|--------|--------|-------|
|
|
| 2.2.1 HMM regímenes | ✅ | general-purpose | hmm_regime.py (~450 líneas) |
|
|
| 2.2.2 Features momentum | ✅ | general-purpose | feature_engineering.py (~540 líneas) |
|
|
| 2.2.3 LSTM + XGBoost | ✅ | general-purpose | model.py (~600 líneas) |
|
|
| 2.2.4 Entrenar por activo | ✅ | general-purpose | trainer.py (~530 líneas) |
|
|
| 2.2.5 Validar regímenes | ✅ | general-purpose | Incluido en trainer |
|
|
| 2.2.6 Documentación | ✅ | general-purpose | __init__.py |
|
|
|
|
#### TASK-2.3: Strategy VBP ✅
|
|
|
|
| Subtarea | Estado | Agente | Notas |
|
|
|----------|--------|--------|-------|
|
|
| 2.3.1 Features volatilidad | ✅ | general-purpose | feature_engineering.py |
|
|
| 2.3.2 CNN 1D + Attention | ✅ | general-purpose | cnn_encoder.py |
|
|
| 2.3.3 Balanced sampling | ✅ | general-purpose | 3x oversampling breakouts |
|
|
| 2.3.4 Entrenar por activo | ✅ | general-purpose | trainer.py |
|
|
| 2.3.5 Validar breakouts | ✅ | general-purpose | Métricas especializadas |
|
|
| 2.3.6 Documentación | ✅ | general-purpose | __init__.py |
|
|
|
|
#### TASK-2.4: Strategy MSA ✅
|
|
|
|
| Subtarea | Estado | Agente | Notas |
|
|
|----------|--------|--------|-------|
|
|
| 2.4.1 Detector swing points | ✅ | general-purpose | structure_detector.py (~800 líneas) |
|
|
| 2.4.2 Features ICT/SMC | ✅ | general-purpose | BOS, CHoCH, FVG, OB implementados |
|
|
| 2.4.3 Modelo XGBoost | ✅ | general-purpose | model.py (~470 líneas) |
|
|
| 2.4.4 Entrenar por activo | ✅ | general-purpose | trainer.py (~470 líneas) |
|
|
| 2.4.5 Validar estructura | ✅ | general-purpose | Métricas por tipo de predicción |
|
|
| 2.4.6 Documentación | ✅ | general-purpose | __init__.py |
|
|
|
|
#### TASK-2.5: Strategy MTS ✅
|
|
|
|
| Subtarea | Estado | Agente | Notas |
|
|
|----------|--------|--------|-------|
|
|
| 2.5.1 Agregación multi-TF | ✅ | general-purpose | feature_engineering.py |
|
|
| 2.5.2 Hierarchical Attention | ✅ | general-purpose | hierarchical_attention.py |
|
|
| 2.5.3 Síntesis señales | ✅ | general-purpose | model.py con XGBoost |
|
|
| 2.5.4 Entrenar por activo | ✅ | general-purpose | trainer.py |
|
|
| 2.5.5 Validar alineación | ✅ | general-purpose | Métricas de alignment |
|
|
| 2.5.6 Documentación | ✅ | general-purpose | __init__.py |
|
|
|
|
---
|
|
|
|
### 1.3 FASE 3: INTEGRACIÓN ✅ COMPLETADA
|
|
|
|
#### TASK-3.1: Metamodel Ensemble ✅
|
|
|
|
| Subtarea | Estado | Inicio | Fin | Notas |
|
|
|----------|--------|--------|-----|-------|
|
|
| 3.1.1 Neural Gating Network | ✅ | 2026-01-25 | 2026-01-25 | gating_network.py + entropy regularization |
|
|
| 3.1.2 Pipeline de ensemble | ✅ | 2026-01-25 | 2026-01-25 | ensemble_pipeline.py |
|
|
| 3.1.3 Entrenar gating | ✅ | 2026-01-25 | 2026-01-25 | trainer.py con walk-forward |
|
|
| 3.1.4 Confidence calibration | ✅ | 2026-01-25 | 2026-01-25 | calibration.py (isotonic, Platt, temperature) |
|
|
| 3.1.5 Documentar arquitectura | ✅ | 2026-01-25 | 2026-01-25 | model.py + __init__.py |
|
|
|
|
#### TASK-3.2: LLM Integration ✅
|
|
|
|
| Subtarea | Estado | Inicio | Fin | Notas |
|
|
|----------|--------|--------|-----|-------|
|
|
| 3.2.1 Prompt structure | ✅ | 2026-01-25 | 2026-01-25 | prompts/trading_decision.py |
|
|
| 3.2.2 Signal Formatter | ✅ | 2026-01-25 | 2026-01-25 | signal_formatter.py |
|
|
| 3.2.3 Integrar LLM Agent | ✅ | 2026-01-25 | 2026-01-25 | llm_client.py (Ollama + Claude fallback) |
|
|
| 3.2.4 Signal Logger | ✅ | 2026-01-25 | 2026-01-25 | signal_logger.py + DDL ml.llm_signals |
|
|
| 3.2.5 Documentar flujo | ✅ | 2026-01-25 | 2026-01-25 | integration.py + decision_parser.py |
|
|
|
|
---
|
|
|
|
### 1.4 FASE 4: VALIDACIÓN
|
|
|
|
#### TASK-4.1: Backtesting Validation
|
|
|
|
| Subtarea | Estado | Inicio | Fin | Notas |
|
|
|----------|--------|--------|-----|-------|
|
|
| 4.1.1-4.1.6 | Pendiente | - | - | - |
|
|
|
|
---
|
|
|
|
## 2. ARCHIVOS CREADOS
|
|
|
|
### Fase 1.1 - Data Pipeline
|
|
|
|
| Archivo | Tipo | Líneas | Commit |
|
|
|---------|------|--------|--------|
|
|
| apps/ml-engine/src/data/training_loader.py | module | ~300 | pending |
|
|
| apps/ml-engine/src/data/dataset.py | module | ~250 | pending |
|
|
| apps/ml-engine/src/data/validators.py | module | ~200 | pending |
|
|
| apps/ml-engine/src/data/__init__.py | init | ~50 | pending |
|
|
| apps/data-service/scripts/migrate_historical_data.py | script | ~400 | pending |
|
|
| docs/.../implementacion/DATA-PIPELINE-SPEC.md | docs | ~200 | pending |
|
|
|
|
### Fase 1.2 - Attention Architecture
|
|
|
|
| Archivo | Tipo | Líneas | Commit |
|
|
|---------|------|--------|--------|
|
|
| apps/ml-engine/src/models/attention/multi_head_attention.py | module | ~300 | pending |
|
|
| apps/ml-engine/src/models/attention/positional_encoding.py | module | ~300 | pending |
|
|
| apps/ml-engine/src/models/attention/price_attention.py | module | ~400 | pending |
|
|
| apps/ml-engine/src/models/attention/attention_extractor.py | module | ~500 | pending |
|
|
| apps/ml-engine/src/models/attention/__init__.py | init | ~100 | pending |
|
|
| apps/ml-engine/tests/test_attention_architecture.py | tests | ~600 | pending |
|
|
|
|
**Total Fase 1:** 12 archivos, ~3,600 líneas
|
|
|
|
### Fase 2 - Estrategias de Modelos
|
|
|
|
#### PVA (Price Variation Attention)
|
|
| Archivo | Líneas |
|
|
|---------|--------|
|
|
| strategies/pva/feature_engineering.py | ~700 |
|
|
| strategies/pva/model.py | ~920 |
|
|
| strategies/pva/trainer.py | ~790 |
|
|
| strategies/pva/__init__.py | ~110 |
|
|
|
|
#### MRD (Momentum Regime Detection)
|
|
| Archivo | Líneas |
|
|
|---------|--------|
|
|
| strategies/mrd/feature_engineering.py | ~540 |
|
|
| strategies/mrd/hmm_regime.py | ~450 |
|
|
| strategies/mrd/model.py | ~600 |
|
|
| strategies/mrd/trainer.py | ~530 |
|
|
| strategies/mrd/__init__.py | ~85 |
|
|
|
|
#### VBP (Volatility Breakout Predictor)
|
|
| Archivo | Líneas |
|
|
|---------|--------|
|
|
| strategies/vbp/feature_engineering.py | ~500 |
|
|
| strategies/vbp/cnn_encoder.py | ~400 |
|
|
| strategies/vbp/model.py | ~500 |
|
|
| strategies/vbp/trainer.py | ~450 |
|
|
| strategies/vbp/__init__.py | ~80 |
|
|
|
|
#### MSA (Market Structure Analysis)
|
|
| Archivo | Líneas |
|
|
|---------|--------|
|
|
| strategies/msa/structure_detector.py | ~800 |
|
|
| strategies/msa/feature_engineering.py | ~570 |
|
|
| strategies/msa/model.py | ~470 |
|
|
| strategies/msa/trainer.py | ~470 |
|
|
| strategies/msa/__init__.py | ~90 |
|
|
|
|
#### MTS (Multi-Timeframe Synthesis)
|
|
| Archivo | Líneas |
|
|
|---------|--------|
|
|
| strategies/mts/feature_engineering.py | ~500 |
|
|
| strategies/mts/hierarchical_attention.py | ~450 |
|
|
| strategies/mts/model.py | ~500 |
|
|
| strategies/mts/trainer.py | ~480 |
|
|
| strategies/mts/__init__.py | ~85 |
|
|
|
|
**Total Fase 2:** 24 archivos, ~11,000+ líneas
|
|
|
|
---
|
|
|
|
## 3. ARCHIVOS MODIFICADOS
|
|
|
|
*(Se actualizará durante la ejecución)*
|
|
|
|
| Archivo | Cambio | Commit |
|
|
|---------|--------|--------|
|
|
| - | - | - |
|
|
|
|
---
|
|
|
|
## 4. VALIDACIONES
|
|
|
|
| Validación | Estado | Output |
|
|
|------------|--------|--------|
|
|
| Build ML Engine | Pendiente | - |
|
|
| Tests ML Engine | Pendiente | - |
|
|
| Lint Python | Pendiente | - |
|
|
| Backtesting | Pendiente | - |
|
|
|
|
---
|
|
|
|
## 5. MÉTRICAS DE PROGRESO
|
|
|
|
| Fase | Subtareas | Completadas | % |
|
|
|------|-----------|-------------|---|
|
|
| FASE 1 | 8 | 8 | **100%** ✅ |
|
|
| FASE 2 | 30 | 30 | **100%** ✅ |
|
|
| FASE 3 | 10 | 10 | **100%** ✅ |
|
|
| FASE 4 | 6 | 0 | 0% |
|
|
| **TOTAL** | **54** | **48** | **89%** |
|
|
|
|
---
|
|
|
|
## 6. ISSUES Y BLOCKERS
|
|
|
|
*(Se actualizará durante la ejecución)*
|
|
|
|
| ID | Descripción | Severidad | Estado | Resolución |
|
|
|----|-------------|-----------|--------|------------|
|
|
| - | - | - | - | - |
|
|
|
|
---
|
|
|
|
## 7. COMMITS
|
|
|
|
*(Se actualizará durante la ejecución)*
|
|
|
|
| Hash | Mensaje | Fecha |
|
|
|------|---------|-------|
|
|
| - | - | - |
|
|
|
|
---
|
|
|
|
**Próxima acción:** Iniciar FASE 1 - Data Pipeline
|