Changes include: - Updated architecture documentation - Enhanced module definitions (OQI-001 to OQI-008) - ML integration documentation updates - Trading strategies documentation - Orchestration and inventory updates - Docker configuration updates 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
408 lines
12 KiB
Markdown
408 lines
12 KiB
Markdown
---
|
|
title: "Especificación Técnica - Refactoring Mínimo Viable"
|
|
version: "1.0.0"
|
|
date: "2026-01-06"
|
|
status: "Proposed"
|
|
author: "ML-Specialist + Orquestador"
|
|
epic: "OQI-006"
|
|
tags: ["refactoring", "integration", "ml", "attention"]
|
|
---
|
|
|
|
# REFACTORING MÍNIMO VIABLE - ML PREDICTION SYSTEM
|
|
|
|
## 1. RESUMEN EJECUTIVO
|
|
|
|
Este documento define el plan de refactoring de bajo riesgo para integrar la infraestructura ML existente y lograr mejoras inmediatas en win rate y R:R ratio.
|
|
|
|
**Objetivo**: Pasar de 33-44% win rate a 60%+ con cambios mínimos.
|
|
|
|
**Principio**: NO reescribir, INTEGRAR código existente.
|
|
|
|
---
|
|
|
|
## 2. CAMBIOS PROPUESTOS
|
|
|
|
### 2.1 CAMBIO 1: Cargar Modelos Entrenados (Prioridad ALTA)
|
|
|
|
**Archivo**: `src/services/prediction_service.py`
|
|
|
|
**Antes (línea ~157)**:
|
|
```python
|
|
from ..models.range_predictor import RangePredictor
|
|
self._range_predictor = RangePredictor() # Modelo genérico
|
|
```
|
|
|
|
**Después**:
|
|
```python
|
|
from ..training.symbol_timeframe_trainer import SymbolTimeframeTrainer
|
|
|
|
class PredictionService:
|
|
def __init__(self):
|
|
self._trainers: Dict[str, SymbolTimeframeTrainer] = {}
|
|
self._load_trained_models()
|
|
|
|
def _load_trained_models(self):
|
|
"""Cargar modelos entrenados por símbolo/timeframe"""
|
|
models_path = Path(__file__).parent.parent.parent / 'models' / 'ml_first'
|
|
|
|
for symbol_dir in models_path.iterdir():
|
|
if symbol_dir.is_dir():
|
|
symbol = symbol_dir.name
|
|
trainer = SymbolTimeframeTrainer()
|
|
trainer.load(str(symbol_dir))
|
|
self._trainers[symbol] = trainer
|
|
logger.info(f"✅ Loaded models for {symbol}")
|
|
|
|
def predict_range(self, symbol: str, timeframe: str, df: pd.DataFrame):
|
|
"""Usar modelo específico por símbolo"""
|
|
if symbol in self._trainers:
|
|
return self._trainers[symbol].predict(df, symbol, timeframe)
|
|
else:
|
|
# Fallback a modelo legacy
|
|
return self._range_predictor.predict(df)
|
|
```
|
|
|
|
**Impacto Estimado**: +5-10% precisión
|
|
**Riesgo**: Bajo (fallback a legacy)
|
|
**Esfuerzo**: 2 horas
|
|
|
|
---
|
|
|
|
### 2.2 CAMBIO 2: Eliminar Factores Hardcodeados (Prioridad ALTA)
|
|
|
|
**Archivo**: `src/models/range_predictor_factor.py`
|
|
|
|
**Antes (línea 598-601)**:
|
|
```python
|
|
class PriceDataGenerator:
|
|
SYMBOLS = {
|
|
'XAUUSD': {'base': 2650.0, 'volatility': 0.0012, 'factor': 2.5},
|
|
'EURUSD': {'base': 1.0420, 'volatility': 0.0004, 'factor': 0.0003},
|
|
}
|
|
```
|
|
|
|
**Después**:
|
|
```python
|
|
from ..training.symbol_timeframe_trainer import SYMBOL_CONFIGS
|
|
|
|
class PriceDataGenerator:
|
|
def __init__(self, symbol: str, seed: int = 42):
|
|
self.symbol = symbol
|
|
# Usar configuración centralizada
|
|
config = SYMBOL_CONFIGS.get(symbol)
|
|
if config:
|
|
self.config = {
|
|
'base': 2650.0 if symbol == 'XAUUSD' else 1.0, # Precio actual dinámico
|
|
'volatility': config.base_factor / 1000, # Normalizar
|
|
'factor': config.base_factor
|
|
}
|
|
else:
|
|
# Default para símbolos nuevos
|
|
self.config = self._compute_dynamic_config(symbol)
|
|
```
|
|
|
|
**Impacto**: Soporte para 5+ símbolos (vs 2)
|
|
**Riesgo**: Bajo
|
|
**Esfuerzo**: 1 hora
|
|
|
|
---
|
|
|
|
### 2.3 CAMBIO 3: Activar Filtros Direccionales (Prioridad ALTA)
|
|
|
|
**Archivo**: `src/models/signal_generator.py`
|
|
|
|
**Agregar filtros basados en backtests exitosos**:
|
|
```python
|
|
class DirectionalFilters:
|
|
"""Filtros direccionales validados en backtests"""
|
|
|
|
@staticmethod
|
|
def is_short_valid(indicators: Dict, symbol: str) -> Tuple[bool, int]:
|
|
"""
|
|
Validar señal SHORT (2+ confirmaciones)
|
|
|
|
Returns:
|
|
(is_valid, confirmation_count)
|
|
"""
|
|
confirmations = 0
|
|
|
|
# Filtros que funcionaron en XAUUSD
|
|
if indicators.get('rsi', 50) > 55:
|
|
confirmations += 1
|
|
if indicators.get('sar_above_price', False):
|
|
confirmations += 1
|
|
if indicators.get('cmf', 0) < 0:
|
|
confirmations += 1
|
|
if indicators.get('mfi', 50) > 55:
|
|
confirmations += 1
|
|
|
|
return confirmations >= 2, confirmations
|
|
|
|
@staticmethod
|
|
def is_long_valid(indicators: Dict, symbol: str) -> Tuple[bool, int]:
|
|
"""
|
|
Validar señal LONG (3+ confirmaciones, más estricto)
|
|
"""
|
|
confirmations = 0
|
|
|
|
if indicators.get('rsi', 50) < 35:
|
|
confirmations += 1
|
|
if not indicators.get('sar_above_price', True):
|
|
confirmations += 1
|
|
if indicators.get('cmf', 0) > 0.1:
|
|
confirmations += 1
|
|
if indicators.get('mfi', 50) < 35:
|
|
confirmations += 1
|
|
|
|
return confirmations >= 3, confirmations
|
|
|
|
|
|
# En SignalGenerator.generate()
|
|
def generate(self, df: pd.DataFrame, symbol: str, ...):
|
|
# ... código existente ...
|
|
|
|
# Aplicar filtros direccionales
|
|
indicators = self._compute_indicators(df)
|
|
|
|
if direction == Direction.SHORT:
|
|
is_valid, conf_count = DirectionalFilters.is_short_valid(indicators, symbol)
|
|
if not is_valid:
|
|
return self._neutral_signal()
|
|
confidence *= (1 + 0.1 * conf_count) # Boost por confirmaciones
|
|
|
|
elif direction == Direction.LONG:
|
|
is_valid, conf_count = DirectionalFilters.is_long_valid(indicators, symbol)
|
|
if not is_valid:
|
|
return self._neutral_signal()
|
|
confidence *= (1 + 0.1 * conf_count)
|
|
|
|
# ... resto del código ...
|
|
```
|
|
|
|
**Impacto**: +10-15% win rate (demostrado en backtests)
|
|
**Riesgo**: Bajo (solo filtra, no cambia lógica)
|
|
**Esfuerzo**: 3 horas
|
|
|
|
---
|
|
|
|
### 2.4 CAMBIO 4: Integrar DynamicFactorWeighter (Prioridad MEDIA)
|
|
|
|
**Archivo**: `src/models/enhanced_range_predictor.py` y otros
|
|
|
|
**Agregar attention weighting a XGBoost**:
|
|
```python
|
|
from ..training.dynamic_factor_weighting import DynamicFactorWeighter, DynamicFactorConfig
|
|
|
|
class AttentionWeightedPredictor:
|
|
"""Wrapper para agregar attention a cualquier modelo"""
|
|
|
|
def __init__(self, base_model, config: DynamicFactorConfig = None):
|
|
self.base_model = base_model
|
|
self.weighter = DynamicFactorWeighter(config or DynamicFactorConfig())
|
|
|
|
def fit(self, X: np.ndarray, y: np.ndarray, df: pd.DataFrame):
|
|
"""Entrenar con sample weights de atención"""
|
|
# Calcular pesos de atención
|
|
weights = self.weighter.compute_weights(df)
|
|
|
|
# Entrenar modelo base con pesos
|
|
self.base_model.fit(X, y, sample_weight=weights)
|
|
|
|
return self
|
|
|
|
def predict(self, X: np.ndarray, df: pd.DataFrame = None):
|
|
"""Predecir (opcional: devolver attention weight)"""
|
|
predictions = self.base_model.predict(X)
|
|
|
|
if df is not None:
|
|
weights = self.weighter.compute_weights(df, normalize=False)
|
|
return predictions, weights
|
|
|
|
return predictions
|
|
```
|
|
|
|
**Impacto**: +5-10% enfoque en movimientos significativos
|
|
**Riesgo**: Medio (requiere reentrenamiento)
|
|
**Esfuerzo**: 4 horas
|
|
|
|
---
|
|
|
|
## 3. ORDEN DE IMPLEMENTACIÓN
|
|
|
|
```
|
|
Semana 1:
|
|
├── Día 1-2: CAMBIO 3 - Filtros direccionales
|
|
│ └── Test: Backtesting XAUUSD con filtros
|
|
├── Día 3-4: CAMBIO 1 - Cargar modelos entrenados
|
|
│ └── Test: Verificar predicciones por símbolo
|
|
└── Día 5: CAMBIO 2 - Eliminar hardcoding
|
|
└── Test: Agregar BTCUSD como nuevo símbolo
|
|
|
|
Semana 2:
|
|
├── Día 1-3: CAMBIO 4 - Integrar DynamicFactorWeighter
|
|
│ └── Test: Reentrenar modelo XAUUSD 5m con attention
|
|
└── Día 4-5: Backtesting completo + ajustes
|
|
```
|
|
|
|
---
|
|
|
|
## 4. TESTS DE REGRESIÓN
|
|
|
|
### 4.1 Tests Unitarios
|
|
|
|
```python
|
|
# tests/test_prediction_integration.py
|
|
|
|
def test_symbol_specific_model_loaded():
|
|
"""Verificar que se cargan modelos por símbolo"""
|
|
service = PredictionService()
|
|
assert 'XAUUSD' in service._trainers
|
|
assert service._trainers['XAUUSD'] is not None
|
|
|
|
def test_directional_filters_short():
|
|
"""Verificar filtros SHORT"""
|
|
indicators = {'rsi': 60, 'sar_above_price': True, 'cmf': -0.1, 'mfi': 60}
|
|
is_valid, count = DirectionalFilters.is_short_valid(indicators, 'XAUUSD')
|
|
assert is_valid == True
|
|
assert count >= 2
|
|
|
|
def test_directional_filters_long():
|
|
"""Verificar filtros LONG (más estrictos)"""
|
|
indicators = {'rsi': 30, 'sar_above_price': False, 'cmf': 0.15, 'mfi': 30}
|
|
is_valid, count = DirectionalFilters.is_long_valid(indicators, 'XAUUSD')
|
|
assert is_valid == True
|
|
assert count >= 3
|
|
|
|
def test_attention_weights_computation():
|
|
"""Verificar cálculo de attention weights"""
|
|
df = create_sample_ohlcv(n=500)
|
|
weighter = DynamicFactorWeighter()
|
|
weights = weighter.compute_weights(df)
|
|
|
|
assert len(weights) == len(df)
|
|
assert weights.mean() > 0
|
|
assert weights.max() <= 3.0 # w_max
|
|
|
|
def test_fallback_to_legacy():
|
|
"""Verificar fallback para símbolos no entrenados"""
|
|
service = PredictionService()
|
|
df = create_sample_ohlcv()
|
|
|
|
# Símbolo no entrenado
|
|
result = service.predict_range('UNKNOWN', '5m', df)
|
|
assert result is not None # Fallback funciona
|
|
```
|
|
|
|
### 4.2 Tests de Integración
|
|
|
|
```python
|
|
# tests/test_backtesting_regression.py
|
|
|
|
def test_xauusd_5m_win_rate():
|
|
"""Verificar win rate no disminuye"""
|
|
results = run_backtest('XAUUSD', '5m', period='2025-01')
|
|
|
|
# Baseline: 44% con filtros actuales
|
|
assert results['win_rate'] >= 0.40
|
|
|
|
def test_xauusd_5m_profit_factor():
|
|
"""Verificar profit factor"""
|
|
results = run_backtest('XAUUSD', '5m', period='2025-01')
|
|
|
|
# Baseline: 1.07
|
|
assert results['profit_factor'] >= 1.0
|
|
|
|
def test_attention_improves_signal_quality():
|
|
"""Verificar que attention mejora selección"""
|
|
# Sin attention
|
|
signals_no_attn = generate_signals(use_attention=False)
|
|
|
|
# Con attention
|
|
signals_with_attn = generate_signals(use_attention=True)
|
|
|
|
# Debe haber menos señales pero mejor calidad
|
|
assert len(signals_with_attn) <= len(signals_no_attn)
|
|
assert signals_with_attn.mean_confidence >= signals_no_attn.mean_confidence
|
|
```
|
|
|
|
### 4.3 Tests de Performance
|
|
|
|
```python
|
|
# tests/test_performance.py
|
|
|
|
def test_prediction_latency():
|
|
"""Verificar latencia de predicción < 100ms"""
|
|
service = PredictionService()
|
|
df = create_sample_ohlcv(n=500)
|
|
|
|
start = time.time()
|
|
for _ in range(100):
|
|
service.predict_range('XAUUSD', '5m', df)
|
|
elapsed = (time.time() - start) / 100
|
|
|
|
assert elapsed < 0.1 # < 100ms por predicción
|
|
|
|
def test_model_loading_time():
|
|
"""Verificar tiempo de carga < 5 segundos"""
|
|
start = time.time()
|
|
service = PredictionService()
|
|
elapsed = time.time() - start
|
|
|
|
assert elapsed < 5.0
|
|
```
|
|
|
|
---
|
|
|
|
## 5. CRITERIOS DE ÉXITO
|
|
|
|
| Métrica | Baseline | Post-Refactoring | Meta Final |
|
|
|---------|----------|------------------|------------|
|
|
| Win Rate | 33-44% | ≥ 55% | 80% |
|
|
| Profit Factor | 1.07 | ≥ 1.2 | 1.8 |
|
|
| R:R Ratio | 1.2:1 | ≥ 1.8:1 | 2.5:1 |
|
|
| Latencia | 50ms | < 100ms | < 50ms |
|
|
| Símbolos | 2-3 | 5+ | 100+ |
|
|
|
|
---
|
|
|
|
## 6. ROLLBACK PLAN
|
|
|
|
En caso de regresión:
|
|
|
|
1. **Rollback inmediato**: Revertir a rama `main` anterior
|
|
2. **Fallback en código**: Cada cambio tiene fallback a comportamiento legacy
|
|
3. **Feature flags**:
|
|
```python
|
|
USE_TRAINED_MODELS = os.getenv('USE_TRAINED_MODELS', 'true') == 'true'
|
|
USE_DIRECTIONAL_FILTERS = os.getenv('USE_DIRECTIONAL_FILTERS', 'true') == 'true'
|
|
USE_ATTENTION_WEIGHTING = os.getenv('USE_ATTENTION_WEIGHTING', 'false') == 'true'
|
|
```
|
|
|
|
---
|
|
|
|
## 7. MONITOREO POST-DEPLOY
|
|
|
|
### 7.1 Métricas a Monitorear
|
|
|
|
```python
|
|
# Agregar a prediction_service.py
|
|
from prometheus_client import Counter, Histogram
|
|
|
|
PREDICTIONS_TOTAL = Counter('ml_predictions_total', 'Total predictions', ['symbol', 'direction'])
|
|
PREDICTION_LATENCY = Histogram('ml_prediction_latency_seconds', 'Prediction latency')
|
|
SIGNALS_FILTERED = Counter('ml_signals_filtered', 'Signals filtered by direction', ['reason'])
|
|
```
|
|
|
|
### 7.2 Alertas
|
|
|
|
| Alerta | Condición | Acción |
|
|
|--------|-----------|--------|
|
|
| Win Rate Drop | win_rate < 0.40 por 24h | Review filtros |
|
|
| Latency High | p99 > 200ms | Check model loading |
|
|
| No Signals | 0 signals en 8h | Check filters/data |
|
|
|
|
---
|
|
|
|
*Documento generado: 2026-01-06*
|
|
*Estado: Propuesto para revisión*
|