# Cat\u00e1logo de Features y Targets - Machine Learning

**Versi\u00f3n:** 1.0.0
**Fecha:** 2025-12-05
**M\u00f3dulo:** OQI-006-ml-signals
**Autor:** Trading Strategist - OrbiQuant IA

---

## Tabla de Contenidos

1. [Introducci\u00f3n](#introducci\u00f3n)
2. [Features Base (21)](#features-base-21)
3. [Features AMD (25)](#features-amd-25)
4. [Features ICT (15)](#features-ict-15)
5. [Features SMC (12)](#features-smc-12)
6. [Features de Liquidez (10)](#features-de-liquidez-10)
7. [Features de Microestructura (8)](#features-de-microestructura-8)
8. [Targets para Modelos](#targets-para-modelos)
9. [Feature Engineering Pipeline](#feature-engineering-pipeline)
10. [Consideraciones T\u00e9cnicas](#consideraciones-t\u00e9cnicas)

---

## Introducci\u00f3n

Este documento define el cat\u00e1logo completo de features (variables de entrada) y targets (variables objetivo) utilizados en los modelos ML de OrbiQuant IA.

### Dimensiones Totales

| Categor\u00eda | Features | Modelos que las usan |
|-----------|----------|---------------------|
| **Base T\u00e9cnicos** | 21 | Todos |
| **AMD** | 25 | AMDDetector, Range, TPSL |
| **ICT** | 15 | Range, TPSL, Orchestrator |
| **SMC** | 12 | Range, TPSL, Orchestrator |
| **Liquidez** | 10 | LiquidityHunter, TPSL |
| **Microestructura** | 8 | OrderFlow (opcional) |
| **Total Base** | ~91 features | - |

---

## Features Base (21)

### Categor\u00eda: Volatilidad (8)

| Feature | F\u00f3rmula | Rango | Descripci\u00f3n |
|---------|---------|-------|---------------|
| `volatility_5` | `close.pct_change().rolling(5).std()` | [0, ∞) | Volatilidad 5 periodos |
| `volatility_10` | `close.pct_change().rolling(10).std()` | [0, ∞) | Volatilidad 10 periodos |
| `volatility_20` | `close.pct_change().rolling(20).std()` | [0, ∞) | Volatilidad 20 periodos |
| `volatility_50` | `close.pct_change().rolling(50).std()` | [0, ∞) | Volatilidad 50 periodos |
| `atr_5` | `TrueRange.rolling(5).mean()` | [0, ∞) | Average True Range 5p |
| `atr_10` | `TrueRange.rolling(10).mean()` | [0, ∞) | Average True Range 10p |
| `atr_14` | `TrueRange.rolling(14).mean()` | [0, ∞) | Average True Range 14p (est\u00e1ndar) |
| `atr_ratio` | `atr_14 / atr_14.rolling(50).mean()` | [0, ∞) | Ratio ATR actual vs promedio |

```python
def calculate_volatility_features(df):
    features = {}
    for period in [5, 10, 20, 50]:
        features[f'volatility_{period}'] = df['close'].pct_change().rolling(period).std()

    # ATR
    high_low = df['high'] - df['low']
    high_close = np.abs(df['high'] - df['close'].shift())
    low_close = np.abs(df['low'] - df['close'].shift())
    true_range = pd.concat([high_low, high_close, low_close], axis=1).max(axis=1)

    for period in [5, 10, 14]:
        features[f'atr_{period}'] = true_range.rolling(period).mean()

    features['atr_ratio'] = features['atr_14'] / features['atr_14'].rolling(50).mean()

    return features
```

### Categor\u00eda: Momentum (6)

| Feature | F\u00f3rmula | Rango | Descripci\u00f3n |
|---------|---------|-------|---------------|
| `momentum_5` | `close - close.shift(5)` | (-∞, ∞) | Momentum 5 periodos |
| `momentum_10` | `close - close.shift(10)` | (-∞, ∞) | Momentum 10 periodos |
| `momentum_20` | `close - close.shift(20)` | (-∞, ∞) | Momentum 20 periodos |
| `roc_5` | `(close / close.shift(5) - 1) * 100` | (-100, ∞) | Rate of Change 5p |
| `roc_10` | `(close / close.shift(10) - 1) * 100` | (-100, ∞) | Rate of Change 10p |
| `rsi_14` | Ver f\u00f3rmula RSI | [0, 100] | Relative Strength Index |

```python
def calculate_momentum_features(df):
    features = {}

    # Momentum
    for period in [5, 10, 20]:
        features[f'momentum_{period}'] = df['close'] - df['close'].shift(period)
        features[f'roc_{period}'] = (df['close'] / df['close'].shift(period) - 1) * 100

    # RSI
    delta = df['close'].diff()
    gain = (delta.where(delta > 0, 0)).rolling(14).mean()
    loss = (-delta.where(delta < 0, 0)).rolling(14).mean()
    rs = gain / loss
    features['rsi_14'] = 100 - (100 / (1 + rs))

    return features
```

### Categor\u00eda: Medias M\u00f3viles (7)

| Feature | F\u00f3rmula | Rango | Descripci\u00f3n |
|---------|---------|-------|---------------|
| `sma_10` | `close.rolling(10).mean()` | [0, ∞) | Simple Moving Average 10 |
| `sma_20` | `close.rolling(20).mean()` | [0, ∞) | Simple Moving Average 20 |
| `sma_50` | `close.rolling(50).mean()` | [0, ∞) | Simple Moving Average 50 |
| `sma_ratio_10` | `close / sma_10` | [0, ∞) | Ratio precio/SMA10 |
| `sma_ratio_20` | `close / sma_20` | [0, ∞) | Ratio precio/SMA20 |
| `sma_ratio_50` | `close / sma_50` | [0, ∞) | Ratio precio/SMA50 |
| `sma_slope_20` | `sma_20.diff(5) / 5` | (-∞, ∞) | Pendiente de SMA20 |

```python
def calculate_ma_features(df):
    features = {}

    for period in [10, 20, 50]:
        features[f'sma_{period}'] = df['close'].rolling(period).mean()
        features[f'sma_ratio_{period}'] = df['close'] / features[f'sma_{period}']

    features['sma_slope_20'] = features['sma_20'].diff(5) / 5

    return features
```

---

## Features AMD (25)

### Categor\u00eda: Price Action (10)

| Feature | C\u00e1lculo | Rango | Uso |
|---------|---------|-------|-----|
| `range_ratio` | `(high - low) / high.rolling(20).mean()` | [0, ∞) | Compresi\u00f3n de rango |
| `range_ma` | `(high - low).rolling(20).mean()` | [0, ∞) | Promedio de rango |
| `hl_range_pct` | `(high - low) / close` | [0, 1] | Rango como % de precio |
| `body_size` | `abs(close - open) / (high - low)` | [0, 1] | Tama\u00f1o del cuerpo |
| `upper_wick` | `(high - max(close, open)) / (high - low)` | [0, 1] | Mecha superior |
| `lower_wick` | `(min(close, open) - low) / (high - low)` | [0, 1] | Mecha inferior |
| `buying_pressure` | `(close - low) / (high - low)` | [0, 1] | Presi\u00f3n compradora |
| `selling_pressure` | `(high - close) / (high - low)` | [0, 1] | Presi\u00f3n vendedora |
| `close_position` | `(close - low) / (high - low)` | [0, 1] | Posici\u00f3n del cierre |
| `range_expansion` | `(high - low) / (high - low).shift(1)` | [0, ∞) | Expansi\u00f3n de rango |

### Categor\u00eda: Volumen (8)

| Feature | C\u00e1lculo | Descripci\u00f3n |
|---------|---------|---------------|
| `volume_ratio` | `volume / volume.rolling(20).mean()` | Volumen vs promedio |
| `volume_trend` | `volume.rolling(10).mean() - volume.rolling(30).mean()` | Tendencia de volumen |
| `volume_ma` | `volume.rolling(20).mean()` | Volumen promedio |
| `volume_spike_count` | `(volume > volume_ma * 2).rolling(30).sum()` | Spikes recientes |
| `obv` | Ver c\u00e1lculo OBV | On-Balance Volume |
| `obv_slope` | `obv.diff(5) / 5` | Tendencia de OBV |
| `vwap_distance` | `(close - vwap) / close` | Distancia a VWAP |
| `volume_on_up` | Ver c\u00e1lculo | Volumen en subidas |

```python
def calculate_volume_features(df):
    features = {}

    features['volume_ratio'] = df['volume'] / df['volume'].rolling(20).mean()
    features['volume_trend'] = df['volume'].rolling(10).mean() - df['volume'].rolling(30).mean()
    features['volume_ma'] = df['volume'].rolling(20).mean()
    features['volume_spike_count'] = (df['volume'] > features['volume_ma'] * 2).rolling(30).sum()

    # OBV
    obv = (df['volume'] * ((df['close'] > df['close'].shift(1)).astype(int) * 2 - 1)).cumsum()
    features['obv'] = obv
    features['obv_slope'] = obv.diff(5) / 5

    # VWAP
    vwap = (df['close'] * df['volume']).cumsum() / df['volume'].cumsum()
    features['vwap_distance'] = (df['close'] - vwap) / df['close']

    return features
```

### Categor\u00eda: Market Structure (7)

| Feature | C\u00e1lculo | Uso |
|---------|---------|-----|
| `higher_highs_count` | `(high > high.shift(1)).rolling(10).sum()` | Cuenta HH |
| `higher_lows_count` | `(low > low.shift(1)).rolling(10).sum()` | Cuenta HL |
| `lower_highs_count` | `(high < high.shift(1)).rolling(10).sum()` | Cuenta LH |
| `lower_lows_count` | `(low < low.shift(1)).rolling(10).sum()` | Cuenta LL |
| `swing_high_distance` | `(swing_high_20 - close) / close` | Distancia a swing high |
| `swing_low_distance` | `(close - swing_low_20) / close` | Distancia a swing low |
| `market_structure_score` | Ver c\u00e1lculo | Score de estructura |

```python
def calculate_market_structure_features(df):
    features = {}

    features['higher_highs_count'] = (df['high'] > df['high'].shift(1)).rolling(10).sum()
    features['higher_lows_count'] = (df['low'] > df['low'].shift(1)).rolling(10).sum()
    features['lower_highs_count'] = (df['high'] < df['high'].shift(1)).rolling(10).sum()
    features['lower_lows_count'] = (df['low'] < df['low'].shift(1)).rolling(10).sum()

    swing_high = df['high'].rolling(20).max()
    swing_low = df['low'].rolling(20).min()

    features['swing_high_distance'] = (swing_high - df['close']) / df['close']
    features['swing_low_distance'] = (df['close'] - swing_low) / df['close']

    # Market structure score (-1 bearish, +1 bullish)
    bullish_score = (features['higher_highs_count'] + features['higher_lows_count']) / 20
    bearish_score = (features['lower_highs_count'] + features['lower_lows_count']) / 20
    features['market_structure_score'] = bullish_score - bearish_score

    return features
```

---

## Features ICT (15)

### Categor\u00eda: OTE & Fibonacci (5)

| Feature | C\u00e1lculo | Rango | Descripci\u00f3n |
|---------|---------|-------|---------------|
| `ote_position` | `(close - swing_low) / (swing_high - swing_low)` | [0, 1] | Posici\u00f3n en rango |
| `in_discount_zone` | `1 if ote_position < 0.38 else 0` | {0, 1} | En zona discount |
| `in_premium_zone` | `1 if ote_position > 0.62 else 0` | {0, 1} | En zona premium |
| `in_ote_buy_zone` | `1 if 0.62 <= ote_position <= 0.79 else 0` | {0, 1} | En OTE compra |
| `fib_distance_50` | `abs(ote_position - 0.5)` | [0, 0.5] | Distancia a equilibrio |

### Categor\u00eda: Killzones & Timing (5)

| Feature | C\u00e1lculo | Descripci\u00f3n |
|---------|---------|---------------|
| `is_london_kz` | Basado en hora EST | Killzone London |
| `is_ny_kz` | Basado en hora EST | Killzone NY |
| `is_asian_kz` | Basado en hora EST | Killzone Asian |
| `session_strength` | 0-1 seg\u00fan killzone | Fuerza de sesi\u00f3n |
| `session_overlap` | Detecci\u00f3n de overlap | Overlap London/NY |

```python
def calculate_ict_features(df):
    features = {}

    # OTE position
    swing_high = df['high'].rolling(50).max()
    swing_low = df['low'].rolling(50).min()
    range_size = swing_high - swing_low

    features['ote_position'] = (df['close'] - swing_low) / (range_size + 1e-8)
    features['in_discount_zone'] = (features['ote_position'] < 0.38).astype(int)
    features['in_premium_zone'] = (features['ote_position'] > 0.62).astype(int)
    features['in_ote_buy_zone'] = (
        (features['ote_position'] >= 0.62) & (features['ote_position'] <= 0.79)
    ).astype(int)
    features['fib_distance_50'] = np.abs(features['ote_position'] - 0.5)

    # Killzones
    hour_est = df.index.tz_convert('America/New_York').hour
    features['is_london_kz'] = ((hour_est >= 2) & (hour_est < 5)).astype(int)
    features['is_ny_kz'] = ((hour_est >= 8) & (hour_est < 11)).astype(int)
    features['is_asian_kz'] = ((hour_est >= 20) | (hour_est < 0)).astype(int)

    # Session strength
    features['session_strength'] = 0.1  # default
    features.loc[features['is_london_kz'] == 1, 'session_strength'] = 0.9
    features.loc[features['is_ny_kz'] == 1, 'session_strength'] = 1.0
    features.loc[features['is_asian_kz'] == 1, 'session_strength'] = 0.3

    # Session overlap
    features['session_overlap'] = (
        (hour_est >= 10) & (hour_est < 12)
    ).astype(int)  # London close + NY open

    return features
```

### Categor\u00eda: Ranges (5)

| Feature | C\u00e1lculo | Descripci\u00f3n |
|---------|---------|---------------|
| `weekly_range_position` | Posici\u00f3n en rango semanal | 0-1 |
| `daily_range_position` | Posici\u00f3n en rango diario | 0-1 |
| `weekly_range_size` | High - Low semanal | Absoluto |
| `daily_range_size` | High - Low diario | Absoluto |
| `range_expansion_daily` | Ratio range actual/promedio | >1 = expansi\u00f3n |

---

## Features SMC (12)

### Categor\u00eda: Structure Breaks (6)

| Feature | C\u00e1lculo | Uso |
|---------|---------|-----|
| `choch_bullish_count` | Count en ventana 30 | CHOCHs alcistas |
| `choch_bearish_count` | Count en ventana 30 | CHOCHs bajistas |
| `bos_bullish_count` | Count en ventana 30 | BOS alcistas |
| `bos_bearish_count` | Count en ventana 30 | BOS bajistas |
| `choch_recency` | Bars desde \u00faltimo CHOCH | 0 = muy reciente |
| `bos_recency` | Bars desde \u00faltimo BOS | 0 = muy reciente |

```python
def calculate_smc_features(df):
    features = {}

    # Detectar CHOCHs y BOS
    choch_signals = detect_choch(df, window=20)
    bos_signals = detect_bos(df, window=20)

    # Contar por tipo
    features['choch_bullish_count'] = count_signals_in_window(
        choch_signals, 'bullish_choch', window=30
    )
    features['choch_bearish_count'] = count_signals_in_window(
        choch_signals, 'bearish_choch', window=30
    )
    features['bos_bullish_count'] = count_signals_in_window(
        bos_signals, 'bullish_bos', window=30
    )
    features['bos_bearish_count'] = count_signals_in_window(
        bos_signals, 'bearish_bos', window=30
    )

    # Recency
    features['choch_recency'] = bars_since_last_signal(choch_signals)
    features['bos_recency'] = bars_since_last_signal(bos_signals)

    return features
```

### Categor\u00eda: Displacement & Flow (6)

| Feature | C\u00e1lculo | Descripci\u00f3n |
|---------|---------|---------------|
| `displacement_strength` | Movimiento / ATR | Fuerza de displacement |
| `displacement_direction` | 1=alcista, -1=bajista, 0=neutral | Direcci\u00f3n |
| `displacement_recency` | Bars desde \u00faltimo | Recencia |
| `inducement_count` | Count en ventana 20 | Inducements detectados |
| `inducement_bullish` | Count bullish inducements | Trampas alcistas |
| `inducement_bearish` | Count bearish inducements | Trampas bajistas |

---

## Features de Liquidez (10)

| Feature | C\u00e1lculo | Rango | Descripci\u00f3n |
|---------|---------|-------|---------------|
| `bsl_distance` | `(bsl_level - close) / close` | [0, ∞) | Distancia a BSL |
| `ssl_distance` | `(close - ssl_level) / close` | [0, ∞) | Distancia a SSL |
| `bsl_density` | Count de BSL levels cercanos | [0, ∞) | Densidad de BSL |
| `ssl_density` | Count de SSL levels cercanos | [0, ∞) | Densidad de SSL |
| `bsl_strength` | Volumen en BSL level | [0, ∞) | Fuerza del BSL |
| `ssl_strength` | Volumen en SSL level | [0, ∞) | Fuerza del SSL |
| `liquidity_grab_count` | Count sweeps recientes | [0, ∞) | Sweeps recientes |
| `bsl_sweep_recent` | 1 si sweep reciente | {0, 1} | BSL swept |
| `ssl_sweep_recent` | 1 si sweep reciente | {0, 1} | SSL swept |
| `near_liquidity` | 1 si <1% de level | {0, 1} | Cerca de liquidez |

```python
def calculate_liquidity_features(df, lookback=20):
    features = {}

    # Identificar liquidity pools
    swing_highs = df['high'].rolling(lookback, center=True).max()
    swing_lows = df['low'].rolling(lookback, center=True).min()

    # BSL (Buy Side Liquidity)
    bsl_levels = find_liquidity_levels(df, 'high', lookback)
    features['bsl_distance'] = (bsl_levels['nearest'] - df['close']) / df['close']
    features['bsl_density'] = bsl_levels['density']
    features['bsl_strength'] = bsl_levels['strength']

    # SSL (Sell Side Liquidity)
    ssl_levels = find_liquidity_levels(df, 'low', lookback)
    features['ssl_distance'] = (df['close'] - ssl_levels['nearest']) / df['close']
    features['ssl_density'] = ssl_levels['density']
    features['ssl_strength'] = ssl_levels['strength']

    # Sweeps
    sweeps = detect_liquidity_sweeps(df, window=30)
    features['liquidity_grab_count'] = len(sweeps)
    features['bsl_sweep_recent'] = any(s['type'] == 'bsl' for s in sweeps[-5:])
    features['ssl_sweep_recent'] = any(s['type'] == 'ssl' for s in sweeps[-5:])

    # Proximity
    features['near_liquidity'] = (
        (features['bsl_distance'] < 0.01) | (features['ssl_distance'] < 0.01)
    ).astype(int)

    return features
```

---

## Features de Microestructura (8)

**Nota:** Requiere datos de volumen granular o tick data

| Feature | C\u00e1lculo | Descripci\u00f3n |
|---------|---------|---------------|
| `volume_delta` | `buy_volume - sell_volume` | Delta de volumen |
| `cumulative_volume_delta` | CVD acumulado | CVD |
| `cvd_slope` | `cvd.diff(5) / 5` | Tendencia CVD |
| `tick_imbalance` | `(upticks - downticks) / total_ticks` | Imbalance de ticks |
| `large_orders_count` | Count de \u00f3rdenes grandes | Actividad institucional |
| `order_flow_imbalance` | Ratio buy/sell | -1 a +1 |
| `poc_distance` | Distancia a Point of Control | Distancia a POC |
| `hvn_proximity` | Distancia a High Volume Node | Zona de alto volumen |

```python
def calculate_microstructure_features(df):
    """
    Requiere datos extendidos: buy_volume, sell_volume, tick_data
    """
    features = {}

    if 'buy_volume' in df.columns and 'sell_volume' in df.columns:
        features['volume_delta'] = df['buy_volume'] - df['sell_volume']
        features['cumulative_volume_delta'] = features['volume_delta'].cumsum()
        features['cvd_slope'] = features['cumulative_volume_delta'].diff(5) / 5

        total_volume = df['buy_volume'] + df['sell_volume']
        features['order_flow_imbalance'] = features['volume_delta'] / (total_volume + 1e-8)

        # Large orders
        threshold = df['volume'].rolling(20).mean() * 2
        features['large_orders_count'] = (df['volume'] > threshold).rolling(30).sum()

    # Volume profile
    volume_profile = calculate_volume_profile(df, bins=50)
    features['poc_distance'] = (df['close'] - volume_profile['poc']) / df['close']

    return features
```

---

## Targets para Modelos

### Target 1: AMD Phase (AMDDetector)

```python
TARGET_AMD_PHASE = {
    0: 'neutral',
    1: 'accumulation',
    2: 'manipulation',
    3: 'distribution'
}

def label_amd_phase(df, i, forward_window=20):
    """
    Ver documentaci\u00f3n ESTRATEGIA-AMD-COMPLETA.md
    """
    # Implementaci\u00f3n completa en documento AMD
    pass
```

### Target 2: Delta High/Low (RangePredictor)

```python
# Targets de regresi\u00f3n
TARGETS_RANGE = {
    'delta_high_15m': float,   # Predicci\u00f3n continua
    'delta_low_15m': float,
    'delta_high_1h': float,
    'delta_low_1h': float,

    # Targets de clasificaci\u00f3n (bins)
    'bin_high_15m': int,      # 0-3
    'bin_low_15m': int,
    'bin_high_1h': int,
    'bin_low_1h': int
}

def calculate_range_targets(df, horizons={'15m': 3, '1h': 12}):
    targets = {}
    atr = calculate_atr(df, 14)

    for name, periods in horizons.items():
        # Delta high
        targets[f'delta_high_{name}'] = (
            df['high'].rolling(periods).max().shift(-periods) - df['close']
        ) / df['close']

        # Delta low
        targets[f'delta_low_{name}'] = (
            df['close'] - df['low'].rolling(periods).min().shift(-periods)
        ) / df['close']

        # Bins (volatilidad normalizada por ATR)
        def to_bin(delta_series):
            ratio = delta_series / atr
            bins = pd.cut(
                ratio,
                bins=[-np.inf, 0.3, 0.7, 1.2, np.inf],
                labels=[0, 1, 2, 3]
            )
            return bins.astype(float)

        targets[f'bin_high_{name}'] = to_bin(targets[f'delta_high_{name}'])
        targets[f'bin_low_{name}'] = to_bin(targets[f'delta_low_{name}'])

    return pd.DataFrame(targets)
```

### Target 3: TP vs SL (TPSLClassifier)

```python
TARGETS_TPSL = {
    'tp_first_15m_rr_2_1': int,  # 0 o 1
    'tp_first_15m_rr_3_1': int,
    'tp_first_1h_rr_2_1': int,
    'tp_first_1h_rr_3_1': int
}

def calculate_tpsl_targets(df, rr_configs):
    """
    Simula si TP se alcanza antes que SL
    """
    targets = {}
    atr = calculate_atr(df, 14)

    for rr in rr_configs:
        sl_dist = atr * rr['sl_atr_multiple']
        tp_dist = atr * rr['tp_atr_multiple']

        def check_tp_first(i, horizon_bars):
            if i + horizon_bars >= len(df):
                return np.nan

            entry_price = df['close'].iloc[i]
            sl_price = entry_price - sl_dist.iloc[i]
            tp_price = entry_price + tp_dist.iloc[i]

            future = df.iloc[i+1:i+horizon_bars+1]

            for _, row in future.iterrows():
                if row['low'] <= sl_price:
                    return 0  # SL hit first
                elif row['high'] >= tp_price:
                    return 1  # TP hit first

            return np.nan  # Neither hit

        for horizon_name, horizon_bars in [('15m', 3), ('1h', 12)]:
            target_name = f'tp_first_{horizon_name}_{rr["name"]}'
            targets[target_name] = [
                check_tp_first(i, horizon_bars) for i in range(len(df))
            ]

    return pd.DataFrame(targets)
```

### Target 4: Liquidity Sweep (LiquidityHunter)

```python
TARGETS_LIQUIDITY = {
    'bsl_sweep': int,      # 0 o 1
    'ssl_sweep': int,
    'any_sweep': int,
    'sweep_timing': int    # Bars hasta sweep
}

def label_liquidity_sweep(df, i, forward_window=10):
    """
    Etiqueta si habr\u00e1 liquidity sweep
    """
    if i + forward_window >= len(df):
        return {'bsl_sweep': np.nan, 'ssl_sweep': np.nan}

    swing_high = df['high'].iloc[max(0, i-20):i].max()
    swing_low = df['low'].iloc[max(0, i-20):i].min()

    future = df.iloc[i:i+forward_window]

    # BSL sweep (sweep of highs)
    bsl_swept = (future['high'] >= swing_high * 1.005).any()

    # SSL sweep (sweep of lows)
    ssl_swept = (future['low'] <= swing_low * 0.995).any()

    # Timing
    if bsl_swept:
        sweep_timing = (future['high'] >= swing_high * 1.005).idxmax()
    elif ssl_swept:
        sweep_timing = (future['low'] <= swing_low * 0.995).idxmax()
    else:
        sweep_timing = np.nan

    return {
        'bsl_sweep': 1 if bsl_swept else 0,
        'ssl_sweep': 1 if ssl_swept else 0,
        'any_sweep': 1 if (bsl_swept or ssl_swept) else 0,
        'sweep_timing': sweep_timing
    }
```

### Target 5: Order Flow (OrderFlowAnalyzer)

```python
TARGETS_ORDER_FLOW = {
    'flow_type': int,          # 0=neutral, 1=accumulation, 2=distribution
    'institutional_activity': float  # 0-1 score
}

def label_order_flow(df, i, forward_window=50):
    """
    Basado en CVD y large orders
    """
    if 'cumulative_volume_delta' not in df.columns:
        return {'flow_type': 0}

    current_cvd = df['cumulative_volume_delta'].iloc[i]
    future_cvd = df['cumulative_volume_delta'].iloc[i+forward_window]

    cvd_change = future_cvd - current_cvd

    # Large orders in window
    large_orders = df['large_orders_count'].iloc[i:i+forward_window].sum()

    if cvd_change > 0 and large_orders > 5:
        flow_type = 1  # accumulation
    elif cvd_change < 0 and large_orders > 5:
        flow_type = 2  # distribution
    else:
        flow_type = 0  # neutral

    institutional_activity = min(1.0, large_orders / 10)

    return {
        'flow_type': flow_type,
        'institutional_activity': institutional_activity
    }
```

---

## Feature Engineering Pipeline

### Pipeline Completo

```python
class FeatureEngineeringPipeline:
    """
    Pipeline completo de feature engineering
    """

    def __init__(self, config=None):
        self.config = config or {}
        self.scalers = {}

    def transform(self, df):
        """
        Transforma OHLCV raw a features completos
        """
        features = pd.DataFrame(index=df.index)

        # 1. Base features
        print("Extracting base features...")
        base = self._extract_base_features(df)
        features = pd.concat([features, base], axis=1)

        # 2. AMD features
        print("Extracting AMD features...")
        amd = self._extract_amd_features(df)
        features = pd.concat([features, amd], axis=1)

        # 3. ICT features
        print("Extracting ICT features...")
        ict = self._extract_ict_features(df)
        features = pd.concat([features, ict], axis=1)

        # 4. SMC features
        print("Extracting SMC features...")
        smc = self._extract_smc_features(df)
        features = pd.concat([features, smc], axis=1)

        # 5. Liquidity features
        print("Extracting liquidity features...")
        liquidity = self._extract_liquidity_features(df)
        features = pd.concat([features, liquidity], axis=1)

        # 6. Microstructure (si disponible)
        if 'buy_volume' in df.columns:
            print("Extracting microstructure features...")
            micro = self._extract_microstructure_features(df)
            features = pd.concat([features, micro], axis=1)

        # 7. Scaling
        print("Scaling features...")
        features_scaled = self._scale_features(features)

        # 8. Handle missing values
        features_scaled = features_scaled.fillna(method='ffill').fillna(0)

        return features_scaled

    def _extract_base_features(self, df):
        """Extrae features base (21)"""
        features = {}

        # Volatilidad
        features.update(calculate_volatility_features(df))

        # Momentum
        features.update(calculate_momentum_features(df))

        # Moving averages
        features.update(calculate_ma_features(df))

        return pd.DataFrame(features)

    def _scale_features(self, features):
        """Escala features usando RobustScaler"""
        from sklearn.preprocessing import RobustScaler

        if not self.scalers:
            # Fit scalers
            for col in features.columns:
                self.scalers[col] = RobustScaler()
                features[col] = self.scalers[col].fit_transform(
                    features[col].values.reshape(-1, 1)
                )
        else:
            # Transform with fitted scalers
            for col in features.columns:
                if col in self.scalers:
                    features[col] = self.scalers[col].transform(
                        features[col].values.reshape(-1, 1)
                    )

        return features
```

### Uso del Pipeline

```python
# Inicializar
pipeline = FeatureEngineeringPipeline()

# Transformar datos
df_raw = load_ohlcv_data('BTCUSDT', '5m')
features = pipeline.transform(df_raw)

print(f"Features shape: {features.shape}")
print(f"Features: {features.columns.tolist()}")

# Features ready for ML models
X = features.values
```

---

## Consideraciones T\u00e9cnicas

### 1. Prevenci\u00f3n de Look-Ahead Bias

**IMPORTANTE:** Nunca usar datos futuros para calcular features

```python
# ✅ CORRECTO
sma_20 = df['close'].rolling(20).mean()

# ❌ INCORRECTO
sma_20 = df['close'].rolling(20, center=True).mean()  # Usa datos futuros!
```

### 2. Handling Missing Values

```python
def handle_missing(features):
    """
    Estrategia de imputaci\u00f3n
    """
    # 1. Forward fill (usar \u00faltimo valor conocido)
    features = features.fillna(method='ffill')

    # 2. Si a\u00fan hay NaNs al inicio, usar 0
    features = features.fillna(0)

    # 3. Alternativa: usar median
    # features = features.fillna(features.median())

    return features
```

### 3. Feature Scaling

```python
from sklearn.preprocessing import RobustScaler, StandardScaler, MinMaxScaler

# Price-based features → RobustScaler (maneja outliers)
price_scaler = RobustScaler()

# Indicators → StandardScaler
indicator_scaler = StandardScaler()

# Ratios/percentages → MinMaxScaler
ratio_scaler = MinMaxScaler(feature_range=(0, 1))
```

### 4. Feature Selection

```python
def select_important_features(X, y, model, top_n=50):
    """
    Selecciona features m\u00e1s importantes
    """
    # Train model
    model.fit(X, y)

    # Get importance
    importance = pd.DataFrame({
        'feature': feature_names,
        'importance': model.feature_importances_
    }).sort_values('importance', ascending=False)

    # Select top N
    selected_features = importance.head(top_n)['feature'].tolist()

    return selected_features
```

### 5. Validaci\u00f3n Temporal

```python
def temporal_validation_split(df, train_pct=0.7, val_pct=0.15):
    """
    Split temporal estricto (sin shuffle)
    """
    n = len(df)
    train_end = int(n * train_pct)
    val_end = int(n * (train_pct + val_pct))

    df_train = df.iloc[:train_end]
    df_val = df.iloc[train_end:val_end]
    df_test = df.iloc[val_end:]

    # Verificar no hay overlap
    assert df_train.index[-1] < df_val.index[0]
    assert df_val.index[-1] < df_test.index[0]

    return df_train, df_val, df_test
```

---

## Resumen de Dimensiones

| Categor\u00eda | Features | Modelos |
|-----------|----------|---------|
| **Base T\u00e9cnicos** | 21 | Todos |
| **AMD** | 25 | AMD, Range, TPSL |
| **ICT** | 15 | Range, TPSL |
| **SMC** | 12 | Range, TPSL |
| **Liquidez** | 10 | Liquidity, TPSL |
| **Microestructura** | 8 | OrderFlow |
| **TOTAL** | **91 features** | - |

---

**Documento Generado:** 2025-12-05
**Pr\u00f3xima Revisi\u00f3n:** 2025-Q1
**Contacto:** ml-engineering@orbiquant.ai