Adrian Flores Cortes 8293938cbe [ESTANDAR-ORCHESTRATION] refactor: Consolidate to standard structure

- Move 7 non-standard folders to _archive/
- Create 5 missing obligatory files
- Update _MAP.md with standardized structure

Standard: SIMCO-ESTANDAR-ORCHESTRATION v1.0.0
Level: CONSUMER (L2)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-24 14:38:26 -06:00

8.6 KiB

Raw Blame History

Reporte de Analisis: RangePredictor R^2 Negativo

Trading Platform - Sprint 1, Tarea S1-T1

Fecha: 2026-01-07 Ejecutor: Claude Opus 4.5 (ML-SPECIALIST) Estado: COMPLETADO

1. RESUMEN EJECUTIVO

1.1 Problema Identificado

El modelo RangePredictor presenta R^2 negativo en todas las evaluaciones:

Modelo	Symbol	Timeframe	Target	R^2
GBPUSD_5m_high_h3	GBPUSD	5m	high	-0.6309
GBPUSD_5m_low_h3	GBPUSD	5m	low	-0.6558
GBPUSD_15m_high_h3	GBPUSD	15m	high	-0.6944
GBPUSD_15m_low_h3	GBPUSD	15m	low	-0.7500

Interpretacion: Un R^2 negativo significa que el modelo predice PEOR que simplemente usar la media historica como prediccion.

1.2 Impacto

Predicciones de rango inutiles para trading
Sistema de senales ML no operativo
Backtesting con win rate bajo (42.1%)

2. ANALISIS DE CAUSAS RAIZ

2.1 Causa 1: Targets Normalizados con Escala Incorrecta

Archivo: src/data/targets.py

Hallazgo: El target se calcula en valores absolutos (USD) pero las features estan normalizadas. En el entrenamiento, los valores de target son muy pequenos (0.0005 - 0.001) debido a normalizacion implicita.

# Linea 206-207 de targets.py
df[f'delta_high_{horizon.name}'] = future_high - df['close']  # Valores en USD
df[f'delta_low_{horizon.name}'] = df['close'] - future_low   # Valores en USD

Problema:

Para GBPUSD, delta_high podria ser 0.0005 (5 pips)
El modelo XGBoost tiene dificultad con valores tan pequenos
La varianza del target es minima comparada con el ruido

Solucion Propuesta:

Normalizar targets por ATR antes de entrenar
Usar targets en pips o puntos en lugar de precio absoluto
Escalar features y targets de forma consistente

2.2 Causa 2: Features No Predictivas para el Target

Archivo: src/data/features.py

Hallazgo: Las features son principalmente indicadores tecnicos (RSI, MACD, Bollinger) que son:

Lagging indicators (basados en precio pasado)
No tienen relacion directa con rango futuro
Estan diseados para direccion, no para magnitud

Features Actuales (Lineas 17-27):

'minimal': [
    'rsi', 'macd', 'macd_signal', 'bb_upper', 'bb_lower',
    'atr', 'volume_zscore', 'returns', 'log_returns'
]

Problema:

RSI predice condicion de sobrecompra/sobreventa, NO rango futuro
MACD predice tendencia, NO magnitud
Solo atr tiene relacion con volatilidad futura

Solucion Propuesta:

Agregar features de volatilidad: ATR lags, volatilidad historica
Agregar features de sesion: hora, dia de semana (codificados ciclicamente)
Agregar features de momentum de volatilidad: cambio en ATR
Reducir features de direccion no relevantes

2.3 Causa 3: Sample Weighting Agresivo

Archivo: src/training/sample_weighting.py

Hallazgo: El weighting de samples (softplus con beta=4.0) es muy agresivo:

Reduce peso de movimientos "normales" casi a cero
Solo entrena efectivamente con movimientos extremos
Esto causa sesgo hacia predicciones de alto rango

Configuracion Actual (Lineas 66-69):

softplus_beta: float = 4.0      # MUY agresivo
softplus_w_max: float = 3.0

Problema:

Modelo aprende solo de 24-35% de los datos (high flow periods)
Predicciones sesgadas hacia valores altos
Varianza de prediccion muy baja (no captura distribucion real)

Solucion Propuesta:

Reducir softplus_beta a 2.0 o menos
Aumentar min_weight para incluir mas samples
Considerar weighting uniforme como baseline

2.4 Causa 4: Data Leakage Potencial

Archivo: src/training/sample_weighting.py, src/data/corrected_targets.py

Hallazgo: Aunque se usa shift(1) en el factor de rolling median, hay posible leakage en:

Targets que incluyen precio actual en calculo de futuros
Features que usan datos futuros implicitamente

Verificacion Requerida:

# Linea 126-129 sample_weighting.py
factor = candle_range.rolling(
    window=window,
    min_periods=min_periods
).median().shift(1)  # Correcto - usa shift(1)

# Linea 190-195 targets.py - VERIFICAR
for i in range(start, end + 1):  # start=1, correcto
    future_highs.append(df['high'].shift(-i))

Resultado: El codigo de targets usa start_offset=1, lo cual es correcto. No hay data leakage evidente en targets, pero hay que verificar features.

2.5 Causa 5: Hiperparametros XGBoost No Optimizados

Archivo: src/models/range_predictor.py

Configuracion Actual (Lineas 146-162):

'xgboost': {
    'n_estimators': 200,
    'max_depth': 5,
    'learning_rate': 0.05,
    'subsample': 0.8,
    'colsample_bytree': 0.8,
    'min_child_weight': 3,
    'gamma': 0.1,
    'reg_alpha': 0.1,
    'reg_lambda': 1.0,
}

Problema:

max_depth=5 puede ser muy profundo para datos ruidosos
learning_rate=0.05 combinado con n_estimators=200 puede overfit
min_child_weight=3 puede ser muy bajo

Solucion Propuesta:

Reducir max_depth a 3
Aumentar min_child_weight a 10
Aumentar regularizacion (reg_alpha, reg_lambda)
Usar early stopping mas agresivo

3. PLAN DE CORRECCION

3.1 Fase 1: Correccion de Targets (Prioridad ALTA)

Archivo: src/data/targets.py

Cambios:

Normalizar targets por ATR:

# Agregar normalizacion
df[f'delta_high_{horizon.name}_norm'] = (future_high - df['close']) / df['ATR']
df[f'delta_low_{horizon.name}_norm'] = (df['close'] - future_low) / df['ATR']

Usar targets normalizados en entrenamiento

Beneficio Esperado: Targets en escala [-3, 3] en lugar de [0, 0.001]

3.2 Fase 2: Correccion de Features (Prioridad ALTA)

Archivo: src/data/features.py

Cambios:

Agregar features de volatilidad:

'volatility': [
    'atr',
    'atr_ratio',  # ATR / rolling_median(ATR)
    'atr_pct_change',
    'range_pct',  # (high-low)/close
    'true_range',
    'realized_volatility_10',
    'realized_volatility_20'
]

Agregar features de sesion (ya existen en create_time_features):

# Ya implementado correctamente
df['hour_sin'] = np.sin(2 * np.pi * df['hour'] / 24)
df['hour_cos'] = np.cos(2 * np.pi * df['hour'] / 24)

Usar solo features relevantes para prediccion de rango

3.3 Fase 3: Ajuste de Sample Weighting (Prioridad MEDIA)

Archivo: src/training/sample_weighting.py

Cambios:

# Configuracion menos agresiva
SampleWeightConfig(
    softplus_beta=2.0,        # Reducir de 4.0
    softplus_w_max=2.0,       # Reducir de 3.0
    min_weight=0.3,           # Aumentar de 0.1
    filter_low_ratio=False    # Incluir todos los samples
)

3.4 Fase 4: Optimizacion de Hiperparametros (Prioridad MEDIA)

Archivo: src/models/range_predictor.py

Cambios:

'xgboost': {
    'n_estimators': 100,      # Reducir
    'max_depth': 3,           # Reducir de 5
    'learning_rate': 0.03,    # Reducir
    'subsample': 0.7,
    'colsample_bytree': 0.7,
    'min_child_weight': 10,   # Aumentar de 3
    'gamma': 0.5,             # Aumentar de 0.1
    'reg_alpha': 1.0,         # Aumentar de 0.1
    'reg_lambda': 10.0,       # Aumentar de 1.0
}

4. CRITERIOS DE EXITO

Metrica	Valor Actual	Minimo Aceptable	Objetivo
R^2 (validacion)	-0.65	> 0.05	> 0.15
MAE (normizado)	N/A	< 0.5 ATR	< 0.3 ATR
Direccion	98%	> 60%	> 65%
Win Rate Backtest	42%	> 50%	> 55%

5. ORDEN DE EJECUCION

S1-T2: Implementar normalizacion de targets por ATR
S1-T3: Verificar no hay data leakage en features
S1-T4a: Reducir agresividad de sample weighting
S1-T4b: Ajustar hiperparametros XGBoost
S1-T5: Reentrenar modelos con correcciones
S1-T6: Validar R^2 > 0 en datos OOS

6. ARCHIVOS A MODIFICAR

Archivo	Tipo de Cambio	Lineas Estimadas
`src/data/targets.py`	Agregar normalizacion	+20
`src/data/features.py`	Agregar features volatilidad	+50
`src/training/sample_weighting.py`	Reducir agresividad	~10
`src/models/range_predictor.py`	Ajustar hiperparametros	~15
`scripts/train_symbol_timeframe_models.py`	Usar targets normalizados	~20

7. RIESGOS

Riesgo	Probabilidad	Mitigacion
R^2 sigue negativo	Media	Plan B: modelo baseline (media movil)
Normalizacion introduce leakage	Baja	Usar ATR shift(1)
Overfitting a nuevos hiperparametros	Media	Walk-forward validation

Reporte completado: 2026-01-07 Siguiente paso: S1-T2 - Implementar normalizacion de targets

8.6 KiB Raw Blame History

Reporte de Analisis: RangePredictor R^2 Negativo

Trading Platform - Sprint 1, Tarea S1-T1

1. RESUMEN EJECUTIVO

1.1 Problema Identificado

1.2 Impacto

2. ANALISIS DE CAUSAS RAIZ

2.1 Causa 1: Targets Normalizados con Escala Incorrecta

2.2 Causa 2: Features No Predictivas para el Target

2.3 Causa 3: Sample Weighting Agresivo

2.4 Causa 4: Data Leakage Potencial

2.5 Causa 5: Hiperparametros XGBoost No Optimizados

3. PLAN DE CORRECCION

3.1 Fase 1: Correccion de Targets (Prioridad ALTA)

3.2 Fase 2: Correccion de Features (Prioridad ALTA)

3.3 Fase 3: Ajuste de Sample Weighting (Prioridad MEDIA)

3.4 Fase 4: Optimizacion de Hiperparametros (Prioridad MEDIA)

4. CRITERIOS DE EXITO

5. ORDEN DE EJECUCION

6. ARCHIVOS A MODIFICAR

7. RIESGOS

8.6 KiB

Raw Blame History