trading-platform/docs/02-definicion-modulos/OQI-006-ml-signals/estrategias/FEATURES-TARGETS-COMPLETO.md

---
id: "FEATURES-TARGETS-COMPLETO"
title: "Features y Targets Completos - Modelos ML Trading Platform"
type: "Documentation"
project: "trading-platform"
version: "1.0.0"
updated_date: "2026-01-04"
---

# Features y Targets Completos - Modelos ML Trading Platform

**Version:** 2.0.0
**Fecha:** 2025-12-08
**Modulo:** OQI-006-ml-signals
**Autor:** Trading Strategist - Trading Platform

---

## Tabla de Contenidos

1. [Vision General](#vision-general)
2. [Features por Categoria](#features-por-categoria)
3. [Targets por Modelo](#targets-por-modelo)
4. [Feature Engineering Pipeline](#feature-engineering-pipeline)
5. [Validacion y Testing](#validacion-y-testing)

---

## Vision General

### Resumen de Features

| Categoria | Cantidad | Modelos que las usan |
|-----------|----------|---------------------|
| Price Action | 12 | AMD, Range, TPSL |
| Volume | 10 | AMD, Range, TPSL |
| Volatility | 8 | AMD, Range, TPSL |
| Trend | 10 | AMD, Range, TPSL |
| Market Structure | 12 | AMD, SMC |
| Order Flow | 10 | AMD, Liquidity |
| Liquidity | 8 | Liquidity, TPSL |
| ICT | 15 | ICT Context, Range |
| SMC | 12 | AMD, TPSL |
| Time | 6 | Todos |
| **TOTAL** | **103** | - |

### Resumen de Targets

| Modelo | Tipo Target | Clases/Valores | Horizonte |
|--------|-------------|----------------|-----------|
| AMDDetector | Multiclass | 4 (neutral, acc, manip, dist) | 20 bars |
| RangePredictor | Regression | delta_high, delta_low | 15m, 1h |
| TPSLClassifier | Binary | 0/1 (SL first / TP first) | Variable |
| LiquidityHunter | Binary | 0/1 (no sweep / sweep) | 10 bars |
| ICTContextModel | Continuous | 0-1 score | Current |

---

## Features por Categoria

### 1. Price Action Features (12)

| # | Feature | Calculo | Tipo | Modelo(s) |
|---|---------|---------|------|-----------|
| 1 | `range_ratio` | `(high - low) / SMA(high - low, 20)` | float | AMD, Range |
| 2 | `range_pct` | `(high - low) / close` | float | AMD, Range |
| 3 | `body_size` | `abs(close - open) / (high - low + 1e-8)` | float | AMD |
| 4 | `upper_wick` | `(high - max(close, open)) / (high - low + 1e-8)` | float | AMD |
| 5 | `lower_wick` | `(min(close, open) - low) / (high - low + 1e-8)` | float | AMD |
| 6 | `buying_pressure` | `(close - low) / (high - low + 1e-8)` | float | AMD, Range |
| 7 | `selling_pressure` | `(high - close) / (high - low + 1e-8)` | float | AMD, Range |
| 8 | `close_position` | `(close - low) / (high - low + 1e-8)` | float | AMD |
| 9 | `range_expansion` | `range_ratio > 1.3` | binary | AMD |
| 10 | `range_compression` | `range_ratio < 0.7` | binary | AMD |
| 11 | `gap_up` | `open > high.shift(1)` | binary | Range |
| 12 | `gap_down` | `open < low.shift(1)` | binary | Range |

```python
def extract_price_action_features(df):
    """Extrae features de price action"""
    f = {}

    hl_range = df['high'] - df['low']
    hl_range_safe = hl_range.replace(0, 1e-8)

    f['range_ratio'] = hl_range / hl_range.rolling(20).mean()
    f['range_pct'] = hl_range / df['close']
    f['body_size'] = abs(df['close'] - df['open']) / hl_range_safe
    f['upper_wick'] = (df['high'] - df[['close', 'open']].max(axis=1)) / hl_range_safe
    f['lower_wick'] = (df[['close', 'open']].min(axis=1) - df['low']) / hl_range_safe
    f['buying_pressure'] = (df['close'] - df['low']) / hl_range_safe
    f['selling_pressure'] = (df['high'] - df['close']) / hl_range_safe
    f['close_position'] = f['buying_pressure']
    f['range_expansion'] = (f['range_ratio'] > 1.3).astype(int)
    f['range_compression'] = (f['range_ratio'] < 0.7).astype(int)
    f['gap_up'] = (df['open'] > df['high'].shift(1)).astype(int)
    f['gap_down'] = (df['open'] < df['low'].shift(1)).astype(int)

    return pd.DataFrame(f)
```

### 2. Volume Features (10)

| # | Feature | Calculo | Tipo | Modelo(s) |
|---|---------|---------|------|-----------|
| 1 | `volume_ratio` | `volume / SMA(volume, 20)` | float | AMD, Range |
| 2 | `volume_trend` | `SMA(volume, 10) - SMA(volume, 30)` | float | AMD |
| 3 | `volume_spike` | `volume > SMA(volume, 20) * 2` | binary | AMD |
| 4 | `obv` | On Balance Volume | float | AMD |
| 5 | `obv_slope` | `(OBV - OBV.shift(5)) / 5` | float | AMD |
| 6 | `vwap` | Volume Weighted Average Price | float | Range |
| 7 | `vwap_distance` | `(close - vwap) / vwap` | float | Range |
| 8 | `volume_on_up` | `sum(vol if close > open) / total_vol` (20 bars) | float | AMD |
| 9 | `volume_on_down` | `sum(vol if close < open) / total_vol` (20 bars) | float | AMD |
| 10 | `volume_imbalance` | `volume_on_up - volume_on_down` | float | AMD |

```python
def extract_volume_features(df):
    """Extrae features de volumen"""
    f = {}

    f['volume_ratio'] = df['volume'] / df['volume'].rolling(20).mean()
    f['volume_trend'] = df['volume'].rolling(10).mean() - df['volume'].rolling(30).mean()
    f['volume_spike'] = (df['volume'] > df['volume'].rolling(20).mean() * 2).astype(int)

    # OBV
    obv_direction = ((df['close'] > df['close'].shift(1)).astype(int) * 2 - 1)
    f['obv'] = (df['volume'] * obv_direction).cumsum()
    f['obv_slope'] = f['obv'].diff(5) / 5

    # VWAP
    typical_price = (df['high'] + df['low'] + df['close']) / 3
    f['vwap'] = (typical_price * df['volume']).cumsum() / df['volume'].cumsum()
    f['vwap_distance'] = (df['close'] - f['vwap']) / f['vwap']

    # Volume distribution
    up_bars = (df['close'] > df['open']).astype(int)
    down_bars = (df['close'] < df['open']).astype(int)

    f['volume_on_up'] = (df['volume'] * up_bars).rolling(20).sum() / df['volume'].rolling(20).sum()
    f['volume_on_down'] = (df['volume'] * down_bars).rolling(20).sum() / df['volume'].rolling(20).sum()
    f['volume_imbalance'] = f['volume_on_up'] - f['volume_on_down']

    return pd.DataFrame(f)
```

### 3. Volatility Features (8)

| # | Feature | Calculo | Tipo | Modelo(s) |
|---|---------|---------|------|-----------|
| 1 | `atr` | Average True Range (14) | float | Todos |
| 2 | `atr_ratio` | `ATR / SMA(ATR, 50)` | float | AMD, Range |
| 3 | `atr_percentile` | Percentile de ATR (100 bars) | float | Range |
| 4 | `volatility_10` | `std(returns, 10)` | float | Range |
| 5 | `volatility_20` | `std(returns, 20)` | float | Range |
| 6 | `volatility_50` | `std(returns, 50)` | float | Range |
| 7 | `volatility_ratio` | `volatility_10 / volatility_50` | float | AMD |
| 8 | `bollinger_width` | `(BB_upper - BB_lower) / BB_middle` | float | Range |

```python
def extract_volatility_features(df):
    """Extrae features de volatilidad"""
    f = {}

    # ATR
    tr = pd.concat([
        df['high'] - df['low'],
        abs(df['high'] - df['close'].shift(1)),
        abs(df['low'] - df['close'].shift(1))
    ], axis=1).max(axis=1)

    f['atr'] = tr.rolling(14).mean()
    f['atr_ratio'] = f['atr'] / f['atr'].rolling(50).mean()
    f['atr_percentile'] = f['atr'].rolling(100).apply(
        lambda x: pd.Series(x).rank(pct=True).iloc[-1]
    )

    # Returns volatility
    returns = df['close'].pct_change()
    f['volatility_10'] = returns.rolling(10).std()
    f['volatility_20'] = returns.rolling(20).std()
    f['volatility_50'] = returns.rolling(50).std()
    f['volatility_ratio'] = f['volatility_10'] / f['volatility_50']

    # Bollinger Width
    sma_20 = df['close'].rolling(20).mean()
    std_20 = df['close'].rolling(20).std()
    bb_upper = sma_20 + 2 * std_20
    bb_lower = sma_20 - 2 * std_20
    f['bollinger_width'] = (bb_upper - bb_lower) / sma_20

    return pd.DataFrame(f)
```

### 4. Trend Features (10)

| # | Feature | Calculo | Tipo | Modelo(s) |
|---|---------|---------|------|-----------|
| 1 | `sma_10` | Simple Moving Average (10) | float | Range |
| 2 | `sma_20` | Simple Moving Average (20) | float | Range |
| 3 | `sma_50` | Simple Moving Average (50) | float | Range |
| 4 | `close_sma_10_ratio` | `close / SMA_10` | float | AMD |
| 5 | `close_sma_20_ratio` | `close / SMA_20` | float | AMD |
| 6 | `sma_slope_20` | `(SMA_20 - SMA_20.shift(5)) / 5` | float | AMD |
| 7 | `trend_strength` | `abs(sma_slope_20) / ATR` | float | AMD |
| 8 | `adx` | Average Directional Index (14) | float | AMD |
| 9 | `plus_di` | +DI (14) | float | Range |
| 10 | `minus_di` | -DI (14) | float | Range |

```python
def extract_trend_features(df):
    """Extrae features de tendencia"""
    f = {}

    # SMAs
    f['sma_10'] = df['close'].rolling(10).mean()
    f['sma_20'] = df['close'].rolling(20).mean()
    f['sma_50'] = df['close'].rolling(50).mean()

    f['close_sma_10_ratio'] = df['close'] / f['sma_10']
    f['close_sma_20_ratio'] = df['close'] / f['sma_20']
    f['sma_slope_20'] = f['sma_20'].diff(5) / 5

    # Trend strength
    atr = calculate_atr(df, 14)
    f['trend_strength'] = abs(f['sma_slope_20']) / atr

    # ADX calculation
    f['adx'], f['plus_di'], f['minus_di'] = calculate_adx(df, 14)

    return pd.DataFrame(f)

def calculate_adx(df, period=14):
    """Calcula ADX, +DI, -DI"""
    plus_dm = df['high'].diff()
    minus_dm = -df['low'].diff()

    plus_dm = plus_dm.where((plus_dm > minus_dm) & (plus_dm > 0), 0)
    minus_dm = minus_dm.where((minus_dm > plus_dm) & (minus_dm > 0), 0)

    tr = pd.concat([
        df['high'] - df['low'],
        abs(df['high'] - df['close'].shift(1)),
        abs(df['low'] - df['close'].shift(1))
    ], axis=1).max(axis=1)

    atr = tr.rolling(period).mean()
    plus_di = 100 * (plus_dm.rolling(period).mean() / atr)
    minus_di = 100 * (minus_dm.rolling(period).mean() / atr)

    dx = 100 * abs(plus_di - minus_di) / (plus_di + minus_di)
    adx = dx.rolling(period).mean()

    return adx, plus_di, minus_di
```

### 5. Market Structure Features (12)

| # | Feature | Calculo | Tipo | Modelo(s) |
|---|---------|---------|------|-----------|
| 1 | `higher_highs_count` | Count HH en 20 bars | int | AMD |
| 2 | `higher_lows_count` | Count HL en 20 bars | int | AMD |
| 3 | `lower_highs_count` | Count LH en 20 bars | int | AMD |
| 4 | `lower_lows_count` | Count LL en 20 bars | int | AMD |
| 5 | `swing_high_distance` | Distancia al swing high mas cercano | float | Liquidity |
| 6 | `swing_low_distance` | Distancia al swing low mas cercano | float | Liquidity |
| 7 | `bos_bullish_count` | Count BOS alcista en 30 bars | int | SMC |
| 8 | `bos_bearish_count` | Count BOS bajista en 30 bars | int | SMC |
| 9 | `choch_bullish_count` | Count CHOCH alcista en 30 bars | int | SMC |
| 10 | `choch_bearish_count` | Count CHOCH bajista en 30 bars | int | SMC |
| 11 | `structure_score` | Score de estructura (-1 a +1) | float | AMD |
| 12 | `structure_alignment` | Alineacion con tendencia | binary | Range |

```python
def extract_market_structure_features(df, lookback=20):
    """Extrae features de estructura de mercado"""
    f = {}

    # Higher highs/lows, Lower highs/lows
    f['higher_highs_count'] = (df['high'] > df['high'].shift(1)).rolling(lookback).sum()
    f['higher_lows_count'] = (df['low'] > df['low'].shift(1)).rolling(lookback).sum()
    f['lower_highs_count'] = (df['high'] < df['high'].shift(1)).rolling(lookback).sum()
    f['lower_lows_count'] = (df['low'] < df['low'].shift(1)).rolling(lookback).sum()

    # Swing distances
    swing_highs = detect_swing_points(df, 'high', lookback)
    swing_lows = detect_swing_points(df, 'low', lookback)

    f['swing_high_distance'] = calculate_distance_to_nearest(df['close'], swing_highs, 'above')
    f['swing_low_distance'] = calculate_distance_to_nearest(df['close'], swing_lows, 'below')

    # BOS and CHOCH counts
    bos_signals = detect_bos(df, lookback)
    choch_signals = detect_choch(df, lookback)

    f['bos_bullish_count'] = count_signals(bos_signals, 'bullish', 30)
    f['bos_bearish_count'] = count_signals(bos_signals, 'bearish', 30)
    f['choch_bullish_count'] = count_signals(choch_signals, 'bullish', 30)
    f['choch_bearish_count'] = count_signals(choch_signals, 'bearish', 30)

    # Structure score
    bullish_points = f['higher_highs_count'] + f['higher_lows_count']
    bearish_points = f['lower_highs_count'] + f['lower_lows_count']
    total_points = bullish_points + bearish_points + 1e-8
    f['structure_score'] = (bullish_points - bearish_points) / total_points

    # Structure alignment
    trend_direction = np.sign(df['close'].rolling(20).mean().diff(5))
    f['structure_alignment'] = (np.sign(f['structure_score']) == trend_direction).astype(int)

    return pd.DataFrame(f)
```

### 6. Order Flow Features (10)

| # | Feature | Calculo | Tipo | Modelo(s) |
|---|---------|---------|------|-----------|
| 1 | `order_blocks_bullish` | Count OB bullish en 30 bars | int | AMD |
| 2 | `order_blocks_bearish` | Count OB bearish en 30 bars | int | AMD |
| 3 | `ob_net` | `OB_bullish - OB_bearish` | int | AMD |
| 4 | `fvg_bullish_count` | Count FVG bullish sin rellenar | int | Range |
| 5 | `fvg_bearish_count` | Count FVG bearish sin rellenar | int | Range |
| 6 | `fvg_nearest_distance` | Distancia a FVG mas cercano | float | Range |
| 7 | `false_breakout_count` | Count falsas rupturas en 30 bars | int | AMD |
| 8 | `whipsaw_intensity` | Frecuencia de reversiones rapidas | float | AMD |
| 9 | `reversal_count` | Count reversiones en 20 bars | int | AMD |
| 10 | `displacement_strength` | Fuerza del ultimo displacement | float | SMC |

```python
def extract_order_flow_features(df, lookback=30):
    """Extrae features de order flow"""
    f = {}

    # Order blocks
    ob_bullish = identify_order_blocks(df, 'bullish')
    ob_bearish = identify_order_blocks(df, 'bearish')

    f['order_blocks_bullish'] = count_recent(ob_bullish, lookback)
    f['order_blocks_bearish'] = count_recent(ob_bearish, lookback)
    f['ob_net'] = f['order_blocks_bullish'] - f['order_blocks_bearish']

    # Fair Value Gaps
    fvg_bullish = identify_fvg(df, 'bullish')
    fvg_bearish = identify_fvg(df, 'bearish')

    f['fvg_bullish_count'] = count_unfilled_fvg(fvg_bullish, df['close'])
    f['fvg_bearish_count'] = count_unfilled_fvg(fvg_bearish, df['close'])
    f['fvg_nearest_distance'] = calculate_nearest_fvg_distance(df, fvg_bullish + fvg_bearish)

    # False breakouts and whipsaws
    f['false_breakout_count'] = count_false_breakouts(df, lookback)
    f['whipsaw_intensity'] = calculate_whipsaw_intensity(df, lookback)

    # Reversals
    price_changes = df['close'].pct_change()
    reversals = ((price_changes > 0.005) & (price_changes.shift(-1) < -0.005)) | \
                ((price_changes < -0.005) & (price_changes.shift(-1) > 0.005))
    f['reversal_count'] = reversals.rolling(20).sum()

    # Displacement
    f['displacement_strength'] = calculate_displacement_strength(df)

    return pd.DataFrame(f)
```

### 7. Liquidity Features (8)

| # | Feature | Calculo | Tipo | Modelo(s) |
|---|---------|---------|------|-----------|
| 1 | `bsl_distance` | Distancia a Buy Side Liquidity | float | Liquidity |
| 2 | `ssl_distance` | Distancia a Sell Side Liquidity | float | Liquidity |
| 3 | `bsl_strength` | Numero de stops acumulados arriba | int | Liquidity |
| 4 | `ssl_strength` | Numero de stops acumulados abajo | int | Liquidity |
| 5 | `liquidity_grab_count` | Count grabs recientes (20 bars) | int | AMD |
| 6 | `time_since_bsl_sweep` | Bars desde ultimo BSL sweep | int | Liquidity |
| 7 | `time_since_ssl_sweep` | Bars desde ultimo SSL sweep | int | Liquidity |
| 8 | `liquidity_imbalance` | `(BSL_strength - SSL_strength) / total` | float | Liquidity |

```python
def extract_liquidity_features(df, lookback=20):
    """Extrae features de liquidez"""
    f = {}

    # Identify liquidity pools
    swing_highs = df['high'].rolling(lookback, center=True).max()
    swing_lows = df['low'].rolling(lookback, center=True).min()

    # Distances to liquidity
    f['bsl_distance'] = (swing_highs - df['close']) / df['close']
    f['ssl_distance'] = (df['close'] - swing_lows) / df['close']

    # Liquidity strength (number of swing points)
    f['bsl_strength'] = count_swing_points_above(df, lookback)
    f['ssl_strength'] = count_swing_points_below(df, lookback)

    # Liquidity grabs
    f['liquidity_grab_count'] = count_liquidity_grabs(df, lookback)

    # Time since sweeps
    bsl_sweeps = detect_bsl_sweeps(df)
    ssl_sweeps = detect_ssl_sweeps(df)

    f['time_since_bsl_sweep'] = bars_since_last(bsl_sweeps)
    f['time_since_ssl_sweep'] = bars_since_last(ssl_sweeps)

    # Liquidity imbalance
    total_liquidity = f['bsl_strength'] + f['ssl_strength'] + 1e-8
    f['liquidity_imbalance'] = (f['bsl_strength'] - f['ssl_strength']) / total_liquidity

    return pd.DataFrame(f)
```

### 8. ICT Features (15)

| # | Feature | Calculo | Tipo | Modelo(s) |
|---|---------|---------|------|-----------|
| 1 | `ote_position` | Posicion en Fibonacci (0-1) | float | ICT |
| 2 | `in_discount_zone` | Precio en 21-38% Fib | binary | ICT |
| 3 | `in_premium_zone` | Precio en 62-79% Fib | binary | ICT |
| 4 | `in_ote_buy_zone` | Zona optima compra (62-79%) | binary | ICT |
| 5 | `in_ote_sell_zone` | Zona optima venta (21-38%) | binary | ICT |
| 6 | `is_london_kz` | En London Open Killzone | binary | ICT |
| 7 | `is_ny_kz` | En NY AM Killzone | binary | ICT |
| 8 | `is_asian_kz` | En Asian Killzone | binary | ICT |
| 9 | `killzone_strength` | Fuerza de la sesion (0-1) | float | ICT |
| 10 | `session_overlap` | En overlap London/NY | binary | ICT |
| 11 | `weekly_range_position` | Posicion en rango semanal | float | ICT |
| 12 | `daily_range_position` | Posicion en rango diario | float | ICT |
| 13 | `mmsm_detected` | Market Maker Sell Model | binary | ICT |
| 14 | `mmbm_detected` | Market Maker Buy Model | binary | ICT |
| 15 | `po3_phase` | Power of 3 phase (1-3) | int | ICT |

```python
def extract_ict_features(df, timestamps):
    """Extrae features ICT"""
    f = {}

    # OTE zones
    swing_high = df['high'].rolling(50).max()
    swing_low = df['low'].rolling(50).min()
    range_size = swing_high - swing_low

    f['ote_position'] = (df['close'] - swing_low) / range_size
    f['in_discount_zone'] = ((f['ote_position'] >= 0.21) & (f['ote_position'] <= 0.38)).astype(int)
    f['in_premium_zone'] = ((f['ote_position'] >= 0.62) & (f['ote_position'] <= 0.79)).astype(int)
    f['in_ote_buy_zone'] = f['in_discount_zone']
    f['in_ote_sell_zone'] = f['in_premium_zone']

    # Killzones
    killzones = identify_killzones(timestamps)
    f['is_london_kz'] = (killzones == 'london_open').astype(int)
    f['is_ny_kz'] = (killzones == 'ny_am').astype(int)
    f['is_asian_kz'] = (killzones == 'asian').astype(int)

    f['killzone_strength'] = get_killzone_strength(killzones)
    f['session_overlap'] = ((killzones == 'london_close') | (killzones == 'ny_am')).astype(int)

    # Range positions
    f['weekly_range_position'] = calculate_weekly_position(df)
    f['daily_range_position'] = calculate_daily_position(df)

    # Market Maker Models
    f['mmsm_detected'] = detect_mmsm(df)
    f['mmbm_detected'] = detect_mmbm(df)

    # Power of 3
    f['po3_phase'] = calculate_po3_phase(df, timestamps)

    return pd.DataFrame(f)
```

### 9. SMC Features (12)

| # | Feature | Calculo | Tipo | Modelo(s) |
|---|---------|---------|------|-----------|
| 1 | `choch_bullish_recent` | CHOCH bullish en 30 bars | binary | SMC |
| 2 | `choch_bearish_recent` | CHOCH bearish en 30 bars | binary | SMC |
| 3 | `bos_bullish_recent` | BOS bullish en 30 bars | binary | SMC |
| 4 | `bos_bearish_recent` | BOS bearish en 30 bars | binary | SMC |
| 5 | `inducement_bullish` | Inducement bullish detectado | binary | SMC |
| 6 | `inducement_bearish` | Inducement bearish detectado | binary | SMC |
| 7 | `displacement_bullish` | Displacement bullish reciente | binary | SMC |
| 8 | `displacement_bearish` | Displacement bearish reciente | binary | SMC |
| 9 | `liquidity_void_distance` | Distancia a void mas cercano | float | SMC |
| 10 | `structure_bullish_score` | Score estructura alcista | float | SMC |
| 11 | `structure_bearish_score` | Score estructura bajista | float | SMC |
| 12 | `smc_confluence_score` | Score de confluence SMC | float | SMC |

```python
def extract_smc_features(df, lookback=30):
    """Extrae features SMC"""
    f = {}

    # CHOCH
    choch_signals = detect_choch(df, lookback)
    f['choch_bullish_recent'] = has_recent_signal(choch_signals, 'bullish', 30)
    f['choch_bearish_recent'] = has_recent_signal(choch_signals, 'bearish', 30)

    # BOS
    bos_signals = detect_bos(df, lookback)
    f['bos_bullish_recent'] = has_recent_signal(bos_signals, 'bullish', 30)
    f['bos_bearish_recent'] = has_recent_signal(bos_signals, 'bearish', 30)

    # Inducement
    inducements = detect_inducement(df)
    f['inducement_bullish'] = has_recent_signal(inducements, 'bullish', 20)
    f['inducement_bearish'] = has_recent_signal(inducements, 'bearish', 20)

    # Displacement
    displacements = detect_displacement(df)
    f['displacement_bullish'] = has_recent_signal(displacements, 'bullish', 10)
    f['displacement_bearish'] = has_recent_signal(displacements, 'bearish', 10)

    # Liquidity voids
    voids = detect_liquidity_voids(df)
    f['liquidity_void_distance'] = calculate_nearest_void_distance(df['close'], voids)

    # Structure scores
    f['structure_bullish_score'] = calculate_bullish_structure_score(df)
    f['structure_bearish_score'] = calculate_bearish_structure_score(df)

    # SMC Confluence
    f['smc_confluence_score'] = calculate_smc_confluence(f)

    return pd.DataFrame(f)
```

### 10. Time Features (6)

| # | Feature | Calculo | Tipo | Modelo(s) |
|---|---------|---------|------|-----------|
| 1 | `hour_sin` | `sin(2 * pi * hour / 24)` | float | Todos |
| 2 | `hour_cos` | `cos(2 * pi * hour / 24)` | float | Todos |
| 3 | `day_of_week` | Dia de la semana (0-6) | int | Todos |
| 4 | `is_weekend` | Sabado o Domingo | binary | Todos |
| 5 | `time_in_session` | Minutos desde inicio sesion | int | ICT |
| 6 | `minutes_to_close` | Minutos hasta cierre sesion | int | ICT |

```python
def extract_time_features(timestamps):
    """Extrae features temporales"""
    f = {}

    hours = timestamps.hour
    f['hour_sin'] = np.sin(2 * np.pi * hours / 24)
    f['hour_cos'] = np.cos(2 * np.pi * hours / 24)
    f['day_of_week'] = timestamps.dayofweek
    f['is_weekend'] = (timestamps.dayofweek >= 5).astype(int)

    # Session timing
    f['time_in_session'] = calculate_time_in_session(timestamps)
    f['minutes_to_close'] = calculate_minutes_to_close(timestamps)

    return pd.DataFrame(f)
```

---

## Targets por Modelo

### 1. AMDDetector Target

**Tipo:** Multiclass Classification (4 clases)

| Clase | Valor | Descripcion |
|-------|-------|-------------|
| Neutral | 0 | Sin fase clara definida |
| Accumulation | 1 | Fase de acumulacion |
| Manipulation | 2 | Fase de manipulacion |
| Distribution | 3 | Fase de distribucion |

**Metodo de Labeling:**

```python
def label_amd_phase(df, i, forward_window=20):
    """
    Etiqueta la fase AMD basada en comportamiento futuro

    Criterios:
    - Accumulation: Rango estrecho + precio sube despues
    - Manipulation: Falsas rupturas + whipsaws
    - Distribution: Volumen en caidas + precio baja despues
    - Neutral: No cumple ninguno claramente
    """
    if i + forward_window >= len(df):
        return 0  # neutral

    future = df.iloc[i:i+forward_window]
    current_price = df['close'].iloc[i]

    # Metricas del futuro
    price_range_pct = (future['high'].max() - future['low'].min()) / current_price
    final_price = future['close'].iloc[-1]
    price_change = (final_price - current_price) / current_price

    # Volumen
    volume_first_half = future['volume'].iloc[:10].mean()
    volume_second_half = future['volume'].iloc[10:].mean()

    # False breakouts
    false_breaks = count_false_breakouts_forward(df, i, forward_window)

    # ACCUMULATION criteria
    if price_range_pct < 0.02:  # Rango < 2%
        if price_change > 0.01:  # Sube > 1% despues
            if volume_second_half < volume_first_half:  # Volumen decreciente
                return 1  # accumulation

    # MANIPULATION criteria
    if false_breaks >= 2:  # 2+ falsas rupturas
        whipsaw_count = count_whipsaws_forward(df, i, forward_window)
        if whipsaw_count >= 3:
            return 2  # manipulation

    # DISTRIBUTION criteria
    if price_change < -0.015:  # Cae > 1.5%
        # Volumen alto en caidas
        down_volume = calculate_volume_on_down_moves(future)
        if down_volume > 0.6:  # 60%+ volumen en caidas
            return 3  # distribution

    return 0  # neutral

def count_false_breakouts_forward(df, i, window):
    """Cuenta falsas rupturas en ventana futura"""
    future = df.iloc[i:i+window]
    resistance = df['high'].iloc[max(0,i-20):i].max()
    support = df['low'].iloc[max(0,i-20):i].min()

    false_breaks = 0
    for j in range(1, len(future)):
        # False breakout above
        if future['high'].iloc[j] > resistance * 1.005:
            if future['close'].iloc[j] < resistance:
                false_breaks += 1
        # False breakdown below
        if future['low'].iloc[j] < support * 0.995:
            if future['close'].iloc[j] > support:
                false_breaks += 1

    return false_breaks
```

**Balance de Clases Esperado:**
- Neutral: ~40%
- Accumulation: ~20%
- Manipulation: ~20%
- Distribution: ~20%

### 2. RangePredictor Target

**Tipo:** Regression (continuo) + Binned Classification

**Targets de Regresion:**

| Target | Calculo | Horizonte |
|--------|---------|-----------|
| `delta_high_15m` | `(max_high_3bars - close) / close` | 15 min |
| `delta_low_15m` | `(close - min_low_3bars) / close` | 15 min |
| `delta_high_1h` | `(max_high_12bars - close) / close` | 1 hora |
| `delta_low_1h` | `(close - min_low_12bars) / close` | 1 hora |

**Targets Binned:**

| Bin | Rango (ATR multiple) | Descripcion |
|-----|---------------------|-------------|
| 0 | < 0.3 ATR | Muy bajo |
| 1 | 0.3 - 0.7 ATR | Bajo |
| 2 | 0.7 - 1.2 ATR | Medio |
| 3 | > 1.2 ATR | Alto |

```python
def calculate_range_targets(df, horizons={'15m': 3, '1h': 12}):
    """
    Calcula targets para RangePredictor
    """
    targets = {}
    atr = calculate_atr(df, 14)

    for name, periods in horizons.items():
        # Regression targets
        future_high = df['high'].rolling(periods).max().shift(-periods)
        future_low = df['low'].rolling(periods).min().shift(-periods)

        targets[f'delta_high_{name}'] = (future_high - df['close']) / df['close']
        targets[f'delta_low_{name}'] = (df['close'] - future_low) / df['close']

        # Binned targets
        for target_type in ['high', 'low']:
            delta = targets[f'delta_{target_type}_{name}']
            atr_ratio = delta / (atr / df['close'])

            bins = pd.cut(
                atr_ratio,
                bins=[-np.inf, 0.3, 0.7, 1.2, np.inf],
                labels=[0, 1, 2, 3]
            )
            targets[f'bin_{target_type}_{name}'] = bins

    return pd.DataFrame(targets)
```

### 3. TPSLClassifier Target

**Tipo:** Binary Classification

| Valor | Descripcion |
|-------|-------------|
| 0 | Stop Loss toca primero |
| 1 | Take Profit toca primero |

**Configuraciones R:R:**

| Config | SL Distance | TP Distance | R:R |
|--------|-------------|-------------|-----|
| `rr_2_1` | 0.3 ATR | 0.6 ATR | 2:1 |
| `rr_3_1` | 0.3 ATR | 0.9 ATR | 3:1 |
| `rr_4_1` | 0.25 ATR | 1.0 ATR | 4:1 |

```python
def calculate_tpsl_targets(df, horizons, rr_configs):
    """
    Calcula targets para TPSLClassifier

    Returns 1 si TP toca primero, 0 si SL toca primero, NaN si ninguno
    """
    targets = {}
    atr = calculate_atr(df, 14)

    for horizon_name, max_bars in horizons.items():
        for rr in rr_configs:
            target_name = f'tp_first_{horizon_name}_{rr["name"]}'

            sl_distance = atr * rr['sl_atr_multiple']
            tp_distance = atr * rr['tp_atr_multiple']

            results = []
            for i in range(len(df)):
                if i + max_bars >= len(df):
                    results.append(np.nan)
                    continue

                entry_price = df['close'].iloc[i]
                sl_price = entry_price - sl_distance.iloc[i]
                tp_price = entry_price + tp_distance.iloc[i]

                # Simular hacia adelante
                result = simulate_trade_outcome(
                    df.iloc[i+1:i+max_bars+1],
                    entry_price,
                    sl_price,
                    tp_price
                )
                results.append(result)

            targets[target_name] = results

    return pd.DataFrame(targets)

def simulate_trade_outcome(future_bars, entry, sl, tp):
    """
    Simula resultado del trade
    Returns: 1 (TP first), 0 (SL first), NaN (neither)
    """
    for _, row in future_bars.iterrows():
        # Check SL first (assuming worst case)
        if row['low'] <= sl:
            return 0
        # Check TP
        if row['high'] >= tp:
            return 1

    return np.nan  # Neither hit within window
```

### 4. LiquidityHunter Target

**Tipo:** Binary Classification

| Valor | Descripcion |
|-------|-------------|
| 0 | No hay liquidity sweep |
| 1 | Hay liquidity sweep |

**Tipos de Sweep:**

| Target | Descripcion |
|--------|-------------|
| `bsl_sweep` | Sweep de Buy Side Liquidity |
| `ssl_sweep` | Sweep de Sell Side Liquidity |
| `any_sweep` | Cualquier sweep |

```python
def calculate_liquidity_targets(df, forward_window=10, sweep_threshold=0.005):
    """
    Calcula targets para LiquidityHunter
    """
    targets = {}

    for i in range(len(df) - forward_window):
        # Current liquidity levels
        lookback = df.iloc[max(0, i-20):i]
        swing_high = lookback['high'].max()
        swing_low = lookback['low'].min()

        # Future price action
        future = df.iloc[i:i+forward_window]

        # BSL sweep (price goes above swing high then reverses)
        bsl_level = swing_high * (1 + sweep_threshold)
        bsl_swept = (future['high'] >= bsl_level).any()
        bsl_reversed = bsl_swept and (future['close'].iloc[-1] < swing_high)

        # SSL sweep (price goes below swing low then reverses)
        ssl_level = swing_low * (1 - sweep_threshold)
        ssl_swept = (future['low'] <= ssl_level).any()
        ssl_reversed = ssl_swept and (future['close'].iloc[-1] > swing_low)

        targets.setdefault('bsl_sweep', []).append(1 if bsl_reversed else 0)
        targets.setdefault('ssl_sweep', []).append(1 if ssl_reversed else 0)
        targets.setdefault('any_sweep', []).append(1 if (bsl_reversed or ssl_reversed) else 0)

    # Padding para ultimos indices
    for key in targets:
        targets[key].extend([np.nan] * forward_window)

    return pd.DataFrame(targets)
```

### 5. ICTContextModel Target

**Tipo:** Continuous Score (0-1)

Este modelo no tiene un target tradicional sino que calcula un score en tiempo real basado en contexto ICT.

```python
def calculate_ict_context_score(df, timestamps):
    """
    Calcula score de contexto ICT (0-1)

    Factores:
    - Killzone strength (40%)
    - OTE position alignment (30%)
    - Range position (20%)
    - MM model detection (10%)
    """
    score = 0.0

    # Killzone
    killzone = identify_killzone(timestamps.iloc[-1])
    kz_strength = get_killzone_strength(killzone)
    score += 0.40 * kz_strength

    # OTE alignment
    ote_pos = calculate_ote_position(df)
    if ote_pos < 0.38:  # Discount
        ote_alignment = 0.38 - ote_pos  # Better if lower
    elif ote_pos > 0.62:  # Premium
        ote_alignment = ote_pos - 0.62  # Better if higher
    else:
        ote_alignment = 0  # Near equilibrium
    score += 0.30 * min(ote_alignment * 3, 1.0)

    # Range position
    daily_pos = calculate_daily_range_position(df)
    range_score = abs(daily_pos - 0.5) * 2  # Better at extremes
    score += 0.20 * range_score

    # MM model
    mm_model = detect_market_maker_model(df)
    if mm_model['model'] != 'none':
        score += 0.10 * mm_model['confidence']

    return score
```

---

## Feature Engineering Pipeline

### Pipeline Completo

```python
class FeatureEngineeringPipeline:
    """Pipeline completo de feature engineering"""

    def __init__(self, config=None):
        self.config = config or self._default_config()
        self.scaler = RobustScaler()
        self.feature_names = []

    def fit_transform(self, df, timestamps=None):
        """Extrae y normaliza todas las features"""

        # 1. Extract all feature groups
        price_features = extract_price_action_features(df)
        volume_features = extract_volume_features(df)
        volatility_features = extract_volatility_features(df)
        trend_features = extract_trend_features(df)
        structure_features = extract_market_structure_features(df)
        order_flow_features = extract_order_flow_features(df)
        liquidity_features = extract_liquidity_features(df)

        if timestamps is not None:
            ict_features = extract_ict_features(df, timestamps)
            time_features = extract_time_features(timestamps)
        else:
            ict_features = pd.DataFrame()
            time_features = pd.DataFrame()

        smc_features = extract_smc_features(df)

        # 2. Combine all features
        all_features = pd.concat([
            price_features,
            volume_features,
            volatility_features,
            trend_features,
            structure_features,
            order_flow_features,
            liquidity_features,
            ict_features,
            smc_features,
            time_features
        ], axis=1)

        # 3. Handle NaN
        all_features = all_features.fillna(method='ffill').fillna(0)

        # 4. Store feature names
        self.feature_names = all_features.columns.tolist()

        # 5. Scale features
        scaled_features = self.scaler.fit_transform(all_features)

        return scaled_features

    def transform(self, df, timestamps=None):
        """Transform solo (usa scaler ya ajustado)"""
        # ... same extraction ...
        return self.scaler.transform(all_features)

    def get_feature_importance(self, model, top_n=20):
        """Obtiene importancia de features"""
        importance = pd.DataFrame({
            'feature': self.feature_names,
            'importance': model.feature_importances_
        }).sort_values('importance', ascending=False)

        return importance.head(top_n)
```

---

## Validacion y Testing

### Metricas por Modelo

| Modelo | Metrica Principal | Target | Metricas Secundarias |
|--------|-------------------|--------|---------------------|
| AMDDetector | Accuracy | >70% | F1 macro >0.65, Precision por clase >60% |
| RangePredictor | MAE | <0.003 | R2 >0.3, Directional Acc >90% |
| TPSLClassifier | AUC | >0.85 | Accuracy >80%, Precision >75% |
| LiquidityHunter | Precision | >70% | Recall >60%, F1 >0.65 |
| ICTContextModel | - | - | Validacion cualitativa |

### Validacion Temporal

```python
def temporal_validation(model, X, y, n_splits=5):
    """
    Validacion respetando orden temporal
    """
    tscv = TimeSeriesSplit(n_splits=n_splits)
    scores = []

    for fold, (train_idx, val_idx) in enumerate(tscv.split(X)):
        X_train, X_val = X[train_idx], X[val_idx]
        y_train, y_val = y[train_idx], y[val_idx]

        model.fit(X_train, y_train)
        y_pred = model.predict(X_val)

        score = calculate_metrics(y_val, y_pred)
        scores.append(score)

    return np.mean(scores), np.std(scores)
```

---

**Documento Generado:** 2025-12-08
**Trading Strategist - Trading Platform**