trading-platform/docs/02-definicion-modulos/OQI-006-ml-signals/estrategias/FEATURES-TARGETS-ML.md
rckrdmrd c1b5081208 feat(ml): Complete FASE 11 - BTCUSD update and comprehensive documentation alignment
ML Engine Updates:
- Updated BTCUSD with Polygon API data (2024-2025): 215,699 new records
- Re-trained all ML models: Attention (R²: 0.223), Base, Metamodel (87.3% confidence)
- Backtest results: +176.71R profit with aggressive_filter strategy

Documentation Consolidation:
- Created docs/99-analisis/_MAP.md index with 13 new analysis documents
- Consolidated inventories: removed duplicates from orchestration/inventarios/
- Updated ML_INVENTORY.yml with BTCUSD metrics and training results
- Added execution reports: FASE11-BTCUSD, correction issues, alignment validation

Architecture & Integration:
- Updated all module documentation with NEXUS v3.4 frontmatter
- Fixed _MAP.md indexes across all folders
- Updated orchestration plans and traces

Files: 229 changed, 5064 insertions(+), 1872 deletions(-)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 09:31:29 -06:00

29 KiB

id title type project version updated_date
FEATURES-TARGETS-ML Catálogo de Features y Targets - Machine Learning Documentation trading-platform 1.0.0 2026-01-04

Cat\u00e1logo de Features y Targets - Machine Learning

Versi\u00f3n: 1.0.0 Fecha: 2025-12-05 M\u00f3dulo: OQI-006-ml-signals Autor: Trading Strategist - Trading Platform


Tabla de Contenidos

  1. Introducci\u00f3n
  2. Features Base (21)
  3. Features AMD (25)
  4. Features ICT (15)
  5. Features SMC (12)
  6. Features de Liquidez (10)
  7. Features de Microestructura (8)
  8. Targets para Modelos
  9. Feature Engineering Pipeline
  10. Consideraciones T\u00e9cnicas

Introducci\u00f3n

Este documento define el cat\u00e1logo completo de features (variables de entrada) y targets (variables objetivo) utilizados en los modelos ML de Trading Platform.

Dimensiones Totales

Categor\u00eda Features Modelos que las usan
Base T\u00e9cnicos 21 Todos
AMD 25 AMDDetector, Range, TPSL
ICT 15 Range, TPSL, Orchestrator
SMC 12 Range, TPSL, Orchestrator
Liquidez 10 LiquidityHunter, TPSL
Microestructura 8 OrderFlow (opcional)
Total Base ~91 features -

Features Base (21)

Categor\u00eda: Volatilidad (8)

Feature F\u00f3rmula Rango Descripci\u00f3n
volatility_5 close.pct_change().rolling(5).std() [0, ∞) Volatilidad 5 periodos
volatility_10 close.pct_change().rolling(10).std() [0, ∞) Volatilidad 10 periodos
volatility_20 close.pct_change().rolling(20).std() [0, ∞) Volatilidad 20 periodos
volatility_50 close.pct_change().rolling(50).std() [0, ∞) Volatilidad 50 periodos
atr_5 TrueRange.rolling(5).mean() [0, ∞) Average True Range 5p
atr_10 TrueRange.rolling(10).mean() [0, ∞) Average True Range 10p
atr_14 TrueRange.rolling(14).mean() [0, ∞) Average True Range 14p (est\u00e1ndar)
atr_ratio atr_14 / atr_14.rolling(50).mean() [0, ∞) Ratio ATR actual vs promedio
def calculate_volatility_features(df):
    features = {}
    for period in [5, 10, 20, 50]:
        features[f'volatility_{period}'] = df['close'].pct_change().rolling(period).std()

    # ATR
    high_low = df['high'] - df['low']
    high_close = np.abs(df['high'] - df['close'].shift())
    low_close = np.abs(df['low'] - df['close'].shift())
    true_range = pd.concat([high_low, high_close, low_close], axis=1).max(axis=1)

    for period in [5, 10, 14]:
        features[f'atr_{period}'] = true_range.rolling(period).mean()

    features['atr_ratio'] = features['atr_14'] / features['atr_14'].rolling(50).mean()

    return features

Categor\u00eda: Momentum (6)

Feature F\u00f3rmula Rango Descripci\u00f3n
momentum_5 close - close.shift(5) (-∞, ∞) Momentum 5 periodos
momentum_10 close - close.shift(10) (-∞, ∞) Momentum 10 periodos
momentum_20 close - close.shift(20) (-∞, ∞) Momentum 20 periodos
roc_5 (close / close.shift(5) - 1) * 100 (-100, ∞) Rate of Change 5p
roc_10 (close / close.shift(10) - 1) * 100 (-100, ∞) Rate of Change 10p
rsi_14 Ver f\u00f3rmula RSI [0, 100] Relative Strength Index
def calculate_momentum_features(df):
    features = {}

    # Momentum
    for period in [5, 10, 20]:
        features[f'momentum_{period}'] = df['close'] - df['close'].shift(period)
        features[f'roc_{period}'] = (df['close'] / df['close'].shift(period) - 1) * 100

    # RSI
    delta = df['close'].diff()
    gain = (delta.where(delta > 0, 0)).rolling(14).mean()
    loss = (-delta.where(delta < 0, 0)).rolling(14).mean()
    rs = gain / loss
    features['rsi_14'] = 100 - (100 / (1 + rs))

    return features

Categor\u00eda: Medias M\u00f3viles (7)

Feature F\u00f3rmula Rango Descripci\u00f3n
sma_10 close.rolling(10).mean() [0, ∞) Simple Moving Average 10
sma_20 close.rolling(20).mean() [0, ∞) Simple Moving Average 20
sma_50 close.rolling(50).mean() [0, ∞) Simple Moving Average 50
sma_ratio_10 close / sma_10 [0, ∞) Ratio precio/SMA10
sma_ratio_20 close / sma_20 [0, ∞) Ratio precio/SMA20
sma_ratio_50 close / sma_50 [0, ∞) Ratio precio/SMA50
sma_slope_20 sma_20.diff(5) / 5 (-∞, ∞) Pendiente de SMA20
def calculate_ma_features(df):
    features = {}

    for period in [10, 20, 50]:
        features[f'sma_{period}'] = df['close'].rolling(period).mean()
        features[f'sma_ratio_{period}'] = df['close'] / features[f'sma_{period}']

    features['sma_slope_20'] = features['sma_20'].diff(5) / 5

    return features

Features AMD (25)

Categor\u00eda: Price Action (10)

Feature C\u00e1lculo Rango Uso
range_ratio (high - low) / high.rolling(20).mean() [0, ∞) Compresi\u00f3n de rango
range_ma (high - low).rolling(20).mean() [0, ∞) Promedio de rango
hl_range_pct (high - low) / close [0, 1] Rango como % de precio
body_size abs(close - open) / (high - low) [0, 1] Tama\u00f1o del cuerpo
upper_wick (high - max(close, open)) / (high - low) [0, 1] Mecha superior
lower_wick (min(close, open) - low) / (high - low) [0, 1] Mecha inferior
buying_pressure (close - low) / (high - low) [0, 1] Presi\u00f3n compradora
selling_pressure (high - close) / (high - low) [0, 1] Presi\u00f3n vendedora
close_position (close - low) / (high - low) [0, 1] Posici\u00f3n del cierre
range_expansion (high - low) / (high - low).shift(1) [0, ∞) Expansi\u00f3n de rango

Categor\u00eda: Volumen (8)

Feature C\u00e1lculo Descripci\u00f3n
volume_ratio volume / volume.rolling(20).mean() Volumen vs promedio
volume_trend volume.rolling(10).mean() - volume.rolling(30).mean() Tendencia de volumen
volume_ma volume.rolling(20).mean() Volumen promedio
volume_spike_count (volume > volume_ma * 2).rolling(30).sum() Spikes recientes
obv Ver c\u00e1lculo OBV On-Balance Volume
obv_slope obv.diff(5) / 5 Tendencia de OBV
vwap_distance (close - vwap) / close Distancia a VWAP
volume_on_up Ver c\u00e1lculo Volumen en subidas
def calculate_volume_features(df):
    features = {}

    features['volume_ratio'] = df['volume'] / df['volume'].rolling(20).mean()
    features['volume_trend'] = df['volume'].rolling(10).mean() - df['volume'].rolling(30).mean()
    features['volume_ma'] = df['volume'].rolling(20).mean()
    features['volume_spike_count'] = (df['volume'] > features['volume_ma'] * 2).rolling(30).sum()

    # OBV
    obv = (df['volume'] * ((df['close'] > df['close'].shift(1)).astype(int) * 2 - 1)).cumsum()
    features['obv'] = obv
    features['obv_slope'] = obv.diff(5) / 5

    # VWAP
    vwap = (df['close'] * df['volume']).cumsum() / df['volume'].cumsum()
    features['vwap_distance'] = (df['close'] - vwap) / df['close']

    return features

Categor\u00eda: Market Structure (7)

Feature C\u00e1lculo Uso
higher_highs_count (high > high.shift(1)).rolling(10).sum() Cuenta HH
higher_lows_count (low > low.shift(1)).rolling(10).sum() Cuenta HL
lower_highs_count (high < high.shift(1)).rolling(10).sum() Cuenta LH
lower_lows_count (low < low.shift(1)).rolling(10).sum() Cuenta LL
swing_high_distance (swing_high_20 - close) / close Distancia a swing high
swing_low_distance (close - swing_low_20) / close Distancia a swing low
market_structure_score Ver c\u00e1lculo Score de estructura
def calculate_market_structure_features(df):
    features = {}

    features['higher_highs_count'] = (df['high'] > df['high'].shift(1)).rolling(10).sum()
    features['higher_lows_count'] = (df['low'] > df['low'].shift(1)).rolling(10).sum()
    features['lower_highs_count'] = (df['high'] < df['high'].shift(1)).rolling(10).sum()
    features['lower_lows_count'] = (df['low'] < df['low'].shift(1)).rolling(10).sum()

    swing_high = df['high'].rolling(20).max()
    swing_low = df['low'].rolling(20).min()

    features['swing_high_distance'] = (swing_high - df['close']) / df['close']
    features['swing_low_distance'] = (df['close'] - swing_low) / df['close']

    # Market structure score (-1 bearish, +1 bullish)
    bullish_score = (features['higher_highs_count'] + features['higher_lows_count']) / 20
    bearish_score = (features['lower_highs_count'] + features['lower_lows_count']) / 20
    features['market_structure_score'] = bullish_score - bearish_score

    return features

Features ICT (15)

Categor\u00eda: OTE & Fibonacci (5)

Feature C\u00e1lculo Rango Descripci\u00f3n
ote_position (close - swing_low) / (swing_high - swing_low) [0, 1] Posici\u00f3n en rango
in_discount_zone 1 if ote_position < 0.38 else 0 {0, 1} En zona discount
in_premium_zone 1 if ote_position > 0.62 else 0 {0, 1} En zona premium
in_ote_buy_zone 1 if 0.62 <= ote_position <= 0.79 else 0 {0, 1} En OTE compra
fib_distance_50 abs(ote_position - 0.5) [0, 0.5] Distancia a equilibrio

Categor\u00eda: Killzones & Timing (5)

Feature C\u00e1lculo Descripci\u00f3n
is_london_kz Basado en hora EST Killzone London
is_ny_kz Basado en hora EST Killzone NY
is_asian_kz Basado en hora EST Killzone Asian
session_strength 0-1 seg\u00fan killzone Fuerza de sesi\u00f3n
session_overlap Detecci\u00f3n de overlap Overlap London/NY
def calculate_ict_features(df):
    features = {}

    # OTE position
    swing_high = df['high'].rolling(50).max()
    swing_low = df['low'].rolling(50).min()
    range_size = swing_high - swing_low

    features['ote_position'] = (df['close'] - swing_low) / (range_size + 1e-8)
    features['in_discount_zone'] = (features['ote_position'] < 0.38).astype(int)
    features['in_premium_zone'] = (features['ote_position'] > 0.62).astype(int)
    features['in_ote_buy_zone'] = (
        (features['ote_position'] >= 0.62) & (features['ote_position'] <= 0.79)
    ).astype(int)
    features['fib_distance_50'] = np.abs(features['ote_position'] - 0.5)

    # Killzones
    hour_est = df.index.tz_convert('America/New_York').hour
    features['is_london_kz'] = ((hour_est >= 2) & (hour_est < 5)).astype(int)
    features['is_ny_kz'] = ((hour_est >= 8) & (hour_est < 11)).astype(int)
    features['is_asian_kz'] = ((hour_est >= 20) | (hour_est < 0)).astype(int)

    # Session strength
    features['session_strength'] = 0.1  # default
    features.loc[features['is_london_kz'] == 1, 'session_strength'] = 0.9
    features.loc[features['is_ny_kz'] == 1, 'session_strength'] = 1.0
    features.loc[features['is_asian_kz'] == 1, 'session_strength'] = 0.3

    # Session overlap
    features['session_overlap'] = (
        (hour_est >= 10) & (hour_est < 12)
    ).astype(int)  # London close + NY open

    return features

Categor\u00eda: Ranges (5)

Feature C\u00e1lculo Descripci\u00f3n
weekly_range_position Posici\u00f3n en rango semanal 0-1
daily_range_position Posici\u00f3n en rango diario 0-1
weekly_range_size High - Low semanal Absoluto
daily_range_size High - Low diario Absoluto
range_expansion_daily Ratio range actual/promedio >1 = expansi\u00f3n

Features SMC (12)

Categor\u00eda: Structure Breaks (6)

Feature C\u00e1lculo Uso
choch_bullish_count Count en ventana 30 CHOCHs alcistas
choch_bearish_count Count en ventana 30 CHOCHs bajistas
bos_bullish_count Count en ventana 30 BOS alcistas
bos_bearish_count Count en ventana 30 BOS bajistas
choch_recency Bars desde \u00faltimo CHOCH 0 = muy reciente
bos_recency Bars desde \u00faltimo BOS 0 = muy reciente
def calculate_smc_features(df):
    features = {}

    # Detectar CHOCHs y BOS
    choch_signals = detect_choch(df, window=20)
    bos_signals = detect_bos(df, window=20)

    # Contar por tipo
    features['choch_bullish_count'] = count_signals_in_window(
        choch_signals, 'bullish_choch', window=30
    )
    features['choch_bearish_count'] = count_signals_in_window(
        choch_signals, 'bearish_choch', window=30
    )
    features['bos_bullish_count'] = count_signals_in_window(
        bos_signals, 'bullish_bos', window=30
    )
    features['bos_bearish_count'] = count_signals_in_window(
        bos_signals, 'bearish_bos', window=30
    )

    # Recency
    features['choch_recency'] = bars_since_last_signal(choch_signals)
    features['bos_recency'] = bars_since_last_signal(bos_signals)

    return features

Categor\u00eda: Displacement & Flow (6)

Feature C\u00e1lculo Descripci\u00f3n
displacement_strength Movimiento / ATR Fuerza de displacement
displacement_direction 1=alcista, -1=bajista, 0=neutral Direcci\u00f3n
displacement_recency Bars desde \u00faltimo Recencia
inducement_count Count en ventana 20 Inducements detectados
inducement_bullish Count bullish inducements Trampas alcistas
inducement_bearish Count bearish inducements Trampas bajistas

Features de Liquidez (10)

Feature C\u00e1lculo Rango Descripci\u00f3n
bsl_distance (bsl_level - close) / close [0, ∞) Distancia a BSL
ssl_distance (close - ssl_level) / close [0, ∞) Distancia a SSL
bsl_density Count de BSL levels cercanos [0, ∞) Densidad de BSL
ssl_density Count de SSL levels cercanos [0, ∞) Densidad de SSL
bsl_strength Volumen en BSL level [0, ∞) Fuerza del BSL
ssl_strength Volumen en SSL level [0, ∞) Fuerza del SSL
liquidity_grab_count Count sweeps recientes [0, ∞) Sweeps recientes
bsl_sweep_recent 1 si sweep reciente {0, 1} BSL swept
ssl_sweep_recent 1 si sweep reciente {0, 1} SSL swept
near_liquidity 1 si <1% de level {0, 1} Cerca de liquidez
def calculate_liquidity_features(df, lookback=20):
    features = {}

    # Identificar liquidity pools
    swing_highs = df['high'].rolling(lookback, center=True).max()
    swing_lows = df['low'].rolling(lookback, center=True).min()

    # BSL (Buy Side Liquidity)
    bsl_levels = find_liquidity_levels(df, 'high', lookback)
    features['bsl_distance'] = (bsl_levels['nearest'] - df['close']) / df['close']
    features['bsl_density'] = bsl_levels['density']
    features['bsl_strength'] = bsl_levels['strength']

    # SSL (Sell Side Liquidity)
    ssl_levels = find_liquidity_levels(df, 'low', lookback)
    features['ssl_distance'] = (df['close'] - ssl_levels['nearest']) / df['close']
    features['ssl_density'] = ssl_levels['density']
    features['ssl_strength'] = ssl_levels['strength']

    # Sweeps
    sweeps = detect_liquidity_sweeps(df, window=30)
    features['liquidity_grab_count'] = len(sweeps)
    features['bsl_sweep_recent'] = any(s['type'] == 'bsl' for s in sweeps[-5:])
    features['ssl_sweep_recent'] = any(s['type'] == 'ssl' for s in sweeps[-5:])

    # Proximity
    features['near_liquidity'] = (
        (features['bsl_distance'] < 0.01) | (features['ssl_distance'] < 0.01)
    ).astype(int)

    return features

Features de Microestructura (8)

Nota: Requiere datos de volumen granular o tick data

Feature C\u00e1lculo Descripci\u00f3n
volume_delta buy_volume - sell_volume Delta de volumen
cumulative_volume_delta CVD acumulado CVD
cvd_slope cvd.diff(5) / 5 Tendencia CVD
tick_imbalance (upticks - downticks) / total_ticks Imbalance de ticks
large_orders_count Count de \u00f3rdenes grandes Actividad institucional
order_flow_imbalance Ratio buy/sell -1 a +1
poc_distance Distancia a Point of Control Distancia a POC
hvn_proximity Distancia a High Volume Node Zona de alto volumen
def calculate_microstructure_features(df):
    """
    Requiere datos extendidos: buy_volume, sell_volume, tick_data
    """
    features = {}

    if 'buy_volume' in df.columns and 'sell_volume' in df.columns:
        features['volume_delta'] = df['buy_volume'] - df['sell_volume']
        features['cumulative_volume_delta'] = features['volume_delta'].cumsum()
        features['cvd_slope'] = features['cumulative_volume_delta'].diff(5) / 5

        total_volume = df['buy_volume'] + df['sell_volume']
        features['order_flow_imbalance'] = features['volume_delta'] / (total_volume + 1e-8)

        # Large orders
        threshold = df['volume'].rolling(20).mean() * 2
        features['large_orders_count'] = (df['volume'] > threshold).rolling(30).sum()

    # Volume profile
    volume_profile = calculate_volume_profile(df, bins=50)
    features['poc_distance'] = (df['close'] - volume_profile['poc']) / df['close']

    return features

Targets para Modelos

Target 1: AMD Phase (AMDDetector)

TARGET_AMD_PHASE = {
    0: 'neutral',
    1: 'accumulation',
    2: 'manipulation',
    3: 'distribution'
}

def label_amd_phase(df, i, forward_window=20):
    """
    Ver documentaci\u00f3n ESTRATEGIA-AMD-COMPLETA.md
    """
    # Implementaci\u00f3n completa en documento AMD
    pass

Target 2: Delta High/Low (RangePredictor)

# Targets de regresi\u00f3n
TARGETS_RANGE = {
    'delta_high_15m': float,   # Predicci\u00f3n continua
    'delta_low_15m': float,
    'delta_high_1h': float,
    'delta_low_1h': float,

    # Targets de clasificaci\u00f3n (bins)
    'bin_high_15m': int,      # 0-3
    'bin_low_15m': int,
    'bin_high_1h': int,
    'bin_low_1h': int
}

def calculate_range_targets(df, horizons={'15m': 3, '1h': 12}):
    targets = {}
    atr = calculate_atr(df, 14)

    for name, periods in horizons.items():
        # Delta high
        targets[f'delta_high_{name}'] = (
            df['high'].rolling(periods).max().shift(-periods) - df['close']
        ) / df['close']

        # Delta low
        targets[f'delta_low_{name}'] = (
            df['close'] - df['low'].rolling(periods).min().shift(-periods)
        ) / df['close']

        # Bins (volatilidad normalizada por ATR)
        def to_bin(delta_series):
            ratio = delta_series / atr
            bins = pd.cut(
                ratio,
                bins=[-np.inf, 0.3, 0.7, 1.2, np.inf],
                labels=[0, 1, 2, 3]
            )
            return bins.astype(float)

        targets[f'bin_high_{name}'] = to_bin(targets[f'delta_high_{name}'])
        targets[f'bin_low_{name}'] = to_bin(targets[f'delta_low_{name}'])

    return pd.DataFrame(targets)

Target 3: TP vs SL (TPSLClassifier)

TARGETS_TPSL = {
    'tp_first_15m_rr_2_1': int,  # 0 o 1
    'tp_first_15m_rr_3_1': int,
    'tp_first_1h_rr_2_1': int,
    'tp_first_1h_rr_3_1': int
}

def calculate_tpsl_targets(df, rr_configs):
    """
    Simula si TP se alcanza antes que SL
    """
    targets = {}
    atr = calculate_atr(df, 14)

    for rr in rr_configs:
        sl_dist = atr * rr['sl_atr_multiple']
        tp_dist = atr * rr['tp_atr_multiple']

        def check_tp_first(i, horizon_bars):
            if i + horizon_bars >= len(df):
                return np.nan

            entry_price = df['close'].iloc[i]
            sl_price = entry_price - sl_dist.iloc[i]
            tp_price = entry_price + tp_dist.iloc[i]

            future = df.iloc[i+1:i+horizon_bars+1]

            for _, row in future.iterrows():
                if row['low'] <= sl_price:
                    return 0  # SL hit first
                elif row['high'] >= tp_price:
                    return 1  # TP hit first

            return np.nan  # Neither hit

        for horizon_name, horizon_bars in [('15m', 3), ('1h', 12)]:
            target_name = f'tp_first_{horizon_name}_{rr["name"]}'
            targets[target_name] = [
                check_tp_first(i, horizon_bars) for i in range(len(df))
            ]

    return pd.DataFrame(targets)

Target 4: Liquidity Sweep (LiquidityHunter)

TARGETS_LIQUIDITY = {
    'bsl_sweep': int,      # 0 o 1
    'ssl_sweep': int,
    'any_sweep': int,
    'sweep_timing': int    # Bars hasta sweep
}

def label_liquidity_sweep(df, i, forward_window=10):
    """
    Etiqueta si habr\u00e1 liquidity sweep
    """
    if i + forward_window >= len(df):
        return {'bsl_sweep': np.nan, 'ssl_sweep': np.nan}

    swing_high = df['high'].iloc[max(0, i-20):i].max()
    swing_low = df['low'].iloc[max(0, i-20):i].min()

    future = df.iloc[i:i+forward_window]

    # BSL sweep (sweep of highs)
    bsl_swept = (future['high'] >= swing_high * 1.005).any()

    # SSL sweep (sweep of lows)
    ssl_swept = (future['low'] <= swing_low * 0.995).any()

    # Timing
    if bsl_swept:
        sweep_timing = (future['high'] >= swing_high * 1.005).idxmax()
    elif ssl_swept:
        sweep_timing = (future['low'] <= swing_low * 0.995).idxmax()
    else:
        sweep_timing = np.nan

    return {
        'bsl_sweep': 1 if bsl_swept else 0,
        'ssl_sweep': 1 if ssl_swept else 0,
        'any_sweep': 1 if (bsl_swept or ssl_swept) else 0,
        'sweep_timing': sweep_timing
    }

Target 5: Order Flow (OrderFlowAnalyzer)

TARGETS_ORDER_FLOW = {
    'flow_type': int,          # 0=neutral, 1=accumulation, 2=distribution
    'institutional_activity': float  # 0-1 score
}

def label_order_flow(df, i, forward_window=50):
    """
    Basado en CVD y large orders
    """
    if 'cumulative_volume_delta' not in df.columns:
        return {'flow_type': 0}

    current_cvd = df['cumulative_volume_delta'].iloc[i]
    future_cvd = df['cumulative_volume_delta'].iloc[i+forward_window]

    cvd_change = future_cvd - current_cvd

    # Large orders in window
    large_orders = df['large_orders_count'].iloc[i:i+forward_window].sum()

    if cvd_change > 0 and large_orders > 5:
        flow_type = 1  # accumulation
    elif cvd_change < 0 and large_orders > 5:
        flow_type = 2  # distribution
    else:
        flow_type = 0  # neutral

    institutional_activity = min(1.0, large_orders / 10)

    return {
        'flow_type': flow_type,
        'institutional_activity': institutional_activity
    }

Feature Engineering Pipeline

Pipeline Completo

class FeatureEngineeringPipeline:
    """
    Pipeline completo de feature engineering
    """

    def __init__(self, config=None):
        self.config = config or {}
        self.scalers = {}

    def transform(self, df):
        """
        Transforma OHLCV raw a features completos
        """
        features = pd.DataFrame(index=df.index)

        # 1. Base features
        print("Extracting base features...")
        base = self._extract_base_features(df)
        features = pd.concat([features, base], axis=1)

        # 2. AMD features
        print("Extracting AMD features...")
        amd = self._extract_amd_features(df)
        features = pd.concat([features, amd], axis=1)

        # 3. ICT features
        print("Extracting ICT features...")
        ict = self._extract_ict_features(df)
        features = pd.concat([features, ict], axis=1)

        # 4. SMC features
        print("Extracting SMC features...")
        smc = self._extract_smc_features(df)
        features = pd.concat([features, smc], axis=1)

        # 5. Liquidity features
        print("Extracting liquidity features...")
        liquidity = self._extract_liquidity_features(df)
        features = pd.concat([features, liquidity], axis=1)

        # 6. Microstructure (si disponible)
        if 'buy_volume' in df.columns:
            print("Extracting microstructure features...")
            micro = self._extract_microstructure_features(df)
            features = pd.concat([features, micro], axis=1)

        # 7. Scaling
        print("Scaling features...")
        features_scaled = self._scale_features(features)

        # 8. Handle missing values
        features_scaled = features_scaled.fillna(method='ffill').fillna(0)

        return features_scaled

    def _extract_base_features(self, df):
        """Extrae features base (21)"""
        features = {}

        # Volatilidad
        features.update(calculate_volatility_features(df))

        # Momentum
        features.update(calculate_momentum_features(df))

        # Moving averages
        features.update(calculate_ma_features(df))

        return pd.DataFrame(features)

    def _scale_features(self, features):
        """Escala features usando RobustScaler"""
        from sklearn.preprocessing import RobustScaler

        if not self.scalers:
            # Fit scalers
            for col in features.columns:
                self.scalers[col] = RobustScaler()
                features[col] = self.scalers[col].fit_transform(
                    features[col].values.reshape(-1, 1)
                )
        else:
            # Transform with fitted scalers
            for col in features.columns:
                if col in self.scalers:
                    features[col] = self.scalers[col].transform(
                        features[col].values.reshape(-1, 1)
                    )

        return features

Uso del Pipeline

# Inicializar
pipeline = FeatureEngineeringPipeline()

# Transformar datos
df_raw = load_ohlcv_data('BTCUSDT', '5m')
features = pipeline.transform(df_raw)

print(f"Features shape: {features.shape}")
print(f"Features: {features.columns.tolist()}")

# Features ready for ML models
X = features.values

Consideraciones T\u00e9cnicas

1. Prevenci\u00f3n de Look-Ahead Bias

IMPORTANTE: Nunca usar datos futuros para calcular features

# ✅ CORRECTO
sma_20 = df['close'].rolling(20).mean()

# ❌ INCORRECTO
sma_20 = df['close'].rolling(20, center=True).mean()  # Usa datos futuros!

2. Handling Missing Values

def handle_missing(features):
    """
    Estrategia de imputaci\u00f3n
    """
    # 1. Forward fill (usar \u00faltimo valor conocido)
    features = features.fillna(method='ffill')

    # 2. Si a\u00fan hay NaNs al inicio, usar 0
    features = features.fillna(0)

    # 3. Alternativa: usar median
    # features = features.fillna(features.median())

    return features

3. Feature Scaling

from sklearn.preprocessing import RobustScaler, StandardScaler, MinMaxScaler

# Price-based features → RobustScaler (maneja outliers)
price_scaler = RobustScaler()

# Indicators → StandardScaler
indicator_scaler = StandardScaler()

# Ratios/percentages → MinMaxScaler
ratio_scaler = MinMaxScaler(feature_range=(0, 1))

4. Feature Selection

def select_important_features(X, y, model, top_n=50):
    """
    Selecciona features m\u00e1s importantes
    """
    # Train model
    model.fit(X, y)

    # Get importance
    importance = pd.DataFrame({
        'feature': feature_names,
        'importance': model.feature_importances_
    }).sort_values('importance', ascending=False)

    # Select top N
    selected_features = importance.head(top_n)['feature'].tolist()

    return selected_features

5. Validaci\u00f3n Temporal

def temporal_validation_split(df, train_pct=0.7, val_pct=0.15):
    """
    Split temporal estricto (sin shuffle)
    """
    n = len(df)
    train_end = int(n * train_pct)
    val_end = int(n * (train_pct + val_pct))

    df_train = df.iloc[:train_end]
    df_val = df.iloc[train_end:val_end]
    df_test = df.iloc[val_end:]

    # Verificar no hay overlap
    assert df_train.index[-1] < df_val.index[0]
    assert df_val.index[-1] < df_test.index[0]

    return df_train, df_val, df_test

Resumen de Dimensiones

Categor\u00eda Features Modelos
Base T\u00e9cnicos 21 Todos
AMD 25 AMD, Range, TPSL
ICT 15 Range, TPSL
SMC 12 Range, TPSL
Liquidez 10 Liquidity, TPSL
Microestructura 8 OrderFlow
TOTAL 91 features -

Documento Generado: 2025-12-05 Pr\u00f3xima Revisi\u00f3n: 2025-Q1 Contacto: ml-engineering@trading.ai