Adrian Flores Cortes 7bfcbb978e docs: Add OQI-006 DATA-PIPELINE-SPEC.md and ML-TRAINING-ENHANCEMENT task docs

- Added DATA-PIPELINE-SPEC.md for ML signals module
- Added TASK-2026-01-25-ML-TRAINING-ENHANCEMENT documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-25 14:32:37 -06:00

9.5 KiB

Raw Blame History

05-EJECUCIÓN: Mejora Integral de Modelos ML para Trading

Task ID: TASK-2026-01-25-ML-TRAINING-ENHANCEMENT Fase: E - Ejecución Estado: Pendiente Fecha: 2026-01-25

1. LOG DE EJECUCIÓN

1.1 FASE 1: INFRAESTRUCTURA ✅ COMPLETADA

TASK-1.1: Data Pipeline ✅

Subtarea	Estado	Inicio	Fin	Notas
1.1.1 Migrar datos MySQL→PostgreSQL	✅ Completada	2026-01-25	2026-01-25	Script creado: migrate_historical_data.py
1.1.2 Implementar data loader	✅ Completada	2026-01-25	2026-01-25	training_loader.py (~300 líneas)
1.1.3 Crear validadores de calidad	✅ Completada	2026-01-25	2026-01-25	validators.py (~200 líneas)
1.1.4 Documentar schema y pipelines	✅ Completada	2026-01-25	2026-01-25	DATA-PIPELINE-SPEC.md

TASK-1.2: Attention Architecture ✅

Subtarea	Estado	Inicio	Fin	Notas
1.2.1 Implementar Price-Focused Attention	✅ Completada	2026-01-25	2026-01-25	price_attention.py (~400 líneas)
1.2.2 Implementar Positional Encoding	✅ Completada	2026-01-25	2026-01-25	positional_encoding.py (~300 líneas)
1.2.3 Crear extractor de attention scores	✅ Completada	2026-01-25	2026-01-25	attention_extractor.py (~500 líneas)
1.2.4 Tests unitarios de attention	✅ Completada	2026-01-25	2026-01-25	test_attention_architecture.py (37 tests)

1.2 FASE 2: ESTRATEGIAS (Paralelo) ✅ COMPLETADA

TASK-2.1: Strategy PVA ✅

Subtarea	Estado	Agente	Notas
2.1.1 Feature engineering retornos	✅	general-purpose	feature_engineering.py (~700 líneas)
2.1.2 Transformer Encoder	✅	general-purpose	Usa PriceFocusedAttention existente
2.1.3 XGBoost prediction head	✅	general-purpose	model.py (~920 líneas)
2.1.4 Entrenar por activo	✅	general-purpose	trainer.py (~790 líneas)
2.1.5 Walk-forward validation	✅	general-purpose	Incluido en trainer
2.1.6 Documentación	✅	general-purpose	init.py con docstrings

TASK-2.2: Strategy MRD ✅

Subtarea	Estado	Agente	Notas
2.2.1 HMM regímenes	✅	general-purpose	hmm_regime.py (~450 líneas)
2.2.2 Features momentum	✅	general-purpose	feature_engineering.py (~540 líneas)
2.2.3 LSTM + XGBoost	✅	general-purpose	model.py (~600 líneas)
2.2.4 Entrenar por activo	✅	general-purpose	trainer.py (~530 líneas)
2.2.5 Validar regímenes	✅	general-purpose	Incluido en trainer
2.2.6 Documentación	✅	general-purpose	init.py

TASK-2.3: Strategy VBP ✅

Subtarea	Estado	Agente	Notas
2.3.1 Features volatilidad	✅	general-purpose	feature_engineering.py
2.3.2 CNN 1D + Attention	✅	general-purpose	cnn_encoder.py
2.3.3 Balanced sampling	✅	general-purpose	3x oversampling breakouts
2.3.4 Entrenar por activo	✅	general-purpose	trainer.py
2.3.5 Validar breakouts	✅	general-purpose	Métricas especializadas
2.3.6 Documentación	✅	general-purpose	init.py

TASK-2.4: Strategy MSA ✅

Subtarea	Estado	Agente	Notas
2.4.1 Detector swing points	✅	general-purpose	structure_detector.py (~800 líneas)
2.4.2 Features ICT/SMC	✅	general-purpose	BOS, CHoCH, FVG, OB implementados
2.4.3 Modelo XGBoost	✅	general-purpose	model.py (~470 líneas)
2.4.4 Entrenar por activo	✅	general-purpose	trainer.py (~470 líneas)
2.4.5 Validar estructura	✅	general-purpose	Métricas por tipo de predicción
2.4.6 Documentación	✅	general-purpose	init.py

TASK-2.5: Strategy MTS ✅

Subtarea	Estado	Agente	Notas
2.5.1 Agregación multi-TF	✅	general-purpose	feature_engineering.py
2.5.2 Hierarchical Attention	✅	general-purpose	hierarchical_attention.py
2.5.3 Síntesis señales	✅	general-purpose	model.py con XGBoost
2.5.4 Entrenar por activo	✅	general-purpose	trainer.py
2.5.5 Validar alineación	✅	general-purpose	Métricas de alignment
2.5.6 Documentación	✅	general-purpose	init.py

1.3 FASE 3: INTEGRACIÓN ✅ COMPLETADA

TASK-3.1: Metamodel Ensemble ✅

Subtarea	Estado	Inicio	Fin	Notas
3.1.1 Neural Gating Network	✅	2026-01-25	2026-01-25	gating_network.py + entropy regularization
3.1.2 Pipeline de ensemble	✅	2026-01-25	2026-01-25	ensemble_pipeline.py
3.1.3 Entrenar gating	✅	2026-01-25	2026-01-25	trainer.py con walk-forward
3.1.4 Confidence calibration	✅	2026-01-25	2026-01-25	calibration.py (isotonic, Platt, temperature)
3.1.5 Documentar arquitectura	✅	2026-01-25	2026-01-25	model.py + init.py

TASK-3.2: LLM Integration ✅

Subtarea	Estado	Inicio	Fin	Notas
3.2.1 Prompt structure	✅	2026-01-25	2026-01-25	prompts/trading_decision.py
3.2.2 Signal Formatter	✅	2026-01-25	2026-01-25	signal_formatter.py
3.2.3 Integrar LLM Agent	✅	2026-01-25	2026-01-25	llm_client.py (Ollama + Claude fallback)
3.2.4 Signal Logger	✅	2026-01-25	2026-01-25	signal_logger.py + DDL ml.llm_signals
3.2.5 Documentar flujo	✅	2026-01-25	2026-01-25	integration.py + decision_parser.py

1.4 FASE 4: VALIDACIÓN

TASK-4.1: Backtesting Validation

Subtarea	Estado	Inicio	Fin	Notas
4.1.1-4.1.6	Pendiente	-	-	-

2. ARCHIVOS CREADOS

Fase 1.1 - Data Pipeline

Archivo	Tipo	Líneas	Commit
apps/ml-engine/src/data/training_loader.py	module	~300	pending
apps/ml-engine/src/data/dataset.py	module	~250	pending
apps/ml-engine/src/data/validators.py	module	~200	pending
apps/ml-engine/src/data/init.py	init	~50	pending
apps/data-service/scripts/migrate_historical_data.py	script	~400	pending
docs/.../implementacion/DATA-PIPELINE-SPEC.md	docs	~200	pending

Fase 1.2 - Attention Architecture

Archivo	Tipo	Líneas	Commit
apps/ml-engine/src/models/attention/multi_head_attention.py	module	~300	pending
apps/ml-engine/src/models/attention/positional_encoding.py	module	~300	pending
apps/ml-engine/src/models/attention/price_attention.py	module	~400	pending
apps/ml-engine/src/models/attention/attention_extractor.py	module	~500	pending
apps/ml-engine/src/models/attention/init.py	init	~100	pending
apps/ml-engine/tests/test_attention_architecture.py	tests	~600	pending

Total Fase 1: 12 archivos, ~3,600 líneas

Fase 2 - Estrategias de Modelos

PVA (Price Variation Attention)

Archivo	Líneas
strategies/pva/feature_engineering.py	~700
strategies/pva/model.py	~920
strategies/pva/trainer.py	~790
strategies/pva/init.py	~110

MRD (Momentum Regime Detection)

Archivo	Líneas
strategies/mrd/feature_engineering.py	~540
strategies/mrd/hmm_regime.py	~450
strategies/mrd/model.py	~600
strategies/mrd/trainer.py	~530
strategies/mrd/init.py	~85

VBP (Volatility Breakout Predictor)

Archivo	Líneas
strategies/vbp/feature_engineering.py	~500
strategies/vbp/cnn_encoder.py	~400
strategies/vbp/model.py	~500
strategies/vbp/trainer.py	~450
strategies/vbp/init.py	~80

MSA (Market Structure Analysis)

Archivo	Líneas
strategies/msa/structure_detector.py	~800
strategies/msa/feature_engineering.py	~570
strategies/msa/model.py	~470
strategies/msa/trainer.py	~470
strategies/msa/init.py	~90

MTS (Multi-Timeframe Synthesis)

Archivo	Líneas
strategies/mts/feature_engineering.py	~500
strategies/mts/hierarchical_attention.py	~450
strategies/mts/model.py	~500
strategies/mts/trainer.py	~480
strategies/mts/init.py	~85

Total Fase 2: 24 archivos, ~11,000+ líneas

3. ARCHIVOS MODIFICADOS

(Se actualizará durante la ejecución)

Archivo	Cambio	Commit
-	-	-

4. VALIDACIONES

Validación	Estado	Output
Build ML Engine	Pendiente	-
Tests ML Engine	Pendiente	-
Lint Python	Pendiente	-
Backtesting	Pendiente	-

5. MÉTRICAS DE PROGRESO

Fase	Subtareas	Completadas	%
FASE 1	8	8	100% ✅
FASE 2	30	30	100% ✅
FASE 3	10	10	100% ✅
FASE 4	6	0	0%
TOTAL	54	48	89%

6. ISSUES Y BLOCKERS

(Se actualizará durante la ejecución)

ID	Descripción	Severidad	Estado	Resolución
-	-	-	-	-

7. COMMITS

(Se actualizará durante la ejecución)

Hash	Mensaje	Fecha
-	-	-

Próxima acción: Iniciar FASE 1 - Data Pipeline

9.5 KiB Raw Blame History

05-EJECUCIÓN: Mejora Integral de Modelos ML para Trading

1. LOG DE EJECUCIÓN

1.1 FASE 1: INFRAESTRUCTURA ✅ COMPLETADA

TASK-1.1: Data Pipeline ✅

TASK-1.2: Attention Architecture ✅

1.2 FASE 2: ESTRATEGIAS (Paralelo) ✅ COMPLETADA

TASK-2.1: Strategy PVA ✅

TASK-2.2: Strategy MRD ✅

TASK-2.3: Strategy VBP ✅

TASK-2.4: Strategy MSA ✅

TASK-2.5: Strategy MTS ✅

1.3 FASE 3: INTEGRACIÓN ✅ COMPLETADA

TASK-3.1: Metamodel Ensemble ✅

TASK-3.2: LLM Integration ✅

1.4 FASE 4: VALIDACIÓN

TASK-4.1: Backtesting Validation

2. ARCHIVOS CREADOS

Fase 1.1 - Data Pipeline

Fase 1.2 - Attention Architecture

Fase 2 - Estrategias de Modelos

PVA (Price Variation Attention)

MRD (Momentum Regime Detection)

VBP (Volatility Breakout Predictor)

MSA (Market Structure Analysis)

MTS (Multi-Timeframe Synthesis)

3. ARCHIVOS MODIFICADOS

4. VALIDACIONES

5. MÉTRICAS DE PROGRESO

6. ISSUES Y BLOCKERS

7. COMMITS

9.5 KiB

Raw Blame History