| id |
title |
type |
project |
version |
updated_date |
| FEATURES-TARGETS-ML |
Catálogo de Features y Targets - Machine Learning |
Documentation |
trading-platform |
1.0.0 |
2026-01-04 |
Cat\u00e1logo de Features y Targets - Machine Learning
Versi\u00f3n: 1.0.0
Fecha: 2025-12-05
M\u00f3dulo: OQI-006-ml-signals
Autor: Trading Strategist - Trading Platform
Tabla de Contenidos
- Introducci\u00f3n
- Features Base (21)
- Features AMD (25)
- Features ICT (15)
- Features SMC (12)
- Features de Liquidez (10)
- Features de Microestructura (8)
- Targets para Modelos
- Feature Engineering Pipeline
- Consideraciones T\u00e9cnicas
Introducci\u00f3n
Este documento define el cat\u00e1logo completo de features (variables de entrada) y targets (variables objetivo) utilizados en los modelos ML de Trading Platform.
Dimensiones Totales
| Categor\u00eda |
Features |
Modelos que las usan |
| Base T\u00e9cnicos |
21 |
Todos |
| AMD |
25 |
AMDDetector, Range, TPSL |
| ICT |
15 |
Range, TPSL, Orchestrator |
| SMC |
12 |
Range, TPSL, Orchestrator |
| Liquidez |
10 |
LiquidityHunter, TPSL |
| Microestructura |
8 |
OrderFlow (opcional) |
| Total Base |
~91 features |
- |
Features Base (21)
Categor\u00eda: Volatilidad (8)
| Feature |
F\u00f3rmula |
Rango |
Descripci\u00f3n |
volatility_5 |
close.pct_change().rolling(5).std() |
[0, ∞) |
Volatilidad 5 periodos |
volatility_10 |
close.pct_change().rolling(10).std() |
[0, ∞) |
Volatilidad 10 periodos |
volatility_20 |
close.pct_change().rolling(20).std() |
[0, ∞) |
Volatilidad 20 periodos |
volatility_50 |
close.pct_change().rolling(50).std() |
[0, ∞) |
Volatilidad 50 periodos |
atr_5 |
TrueRange.rolling(5).mean() |
[0, ∞) |
Average True Range 5p |
atr_10 |
TrueRange.rolling(10).mean() |
[0, ∞) |
Average True Range 10p |
atr_14 |
TrueRange.rolling(14).mean() |
[0, ∞) |
Average True Range 14p (est\u00e1ndar) |
atr_ratio |
atr_14 / atr_14.rolling(50).mean() |
[0, ∞) |
Ratio ATR actual vs promedio |
def calculate_volatility_features(df):
features = {}
for period in [5, 10, 20, 50]:
features[f'volatility_{period}'] = df['close'].pct_change().rolling(period).std()
# ATR
high_low = df['high'] - df['low']
high_close = np.abs(df['high'] - df['close'].shift())
low_close = np.abs(df['low'] - df['close'].shift())
true_range = pd.concat([high_low, high_close, low_close], axis=1).max(axis=1)
for period in [5, 10, 14]:
features[f'atr_{period}'] = true_range.rolling(period).mean()
features['atr_ratio'] = features['atr_14'] / features['atr_14'].rolling(50).mean()
return features
Categor\u00eda: Momentum (6)
| Feature |
F\u00f3rmula |
Rango |
Descripci\u00f3n |
momentum_5 |
close - close.shift(5) |
(-∞, ∞) |
Momentum 5 periodos |
momentum_10 |
close - close.shift(10) |
(-∞, ∞) |
Momentum 10 periodos |
momentum_20 |
close - close.shift(20) |
(-∞, ∞) |
Momentum 20 periodos |
roc_5 |
(close / close.shift(5) - 1) * 100 |
(-100, ∞) |
Rate of Change 5p |
roc_10 |
(close / close.shift(10) - 1) * 100 |
(-100, ∞) |
Rate of Change 10p |
rsi_14 |
Ver f\u00f3rmula RSI |
[0, 100] |
Relative Strength Index |
def calculate_momentum_features(df):
features = {}
# Momentum
for period in [5, 10, 20]:
features[f'momentum_{period}'] = df['close'] - df['close'].shift(period)
features[f'roc_{period}'] = (df['close'] / df['close'].shift(period) - 1) * 100
# RSI
delta = df['close'].diff()
gain = (delta.where(delta > 0, 0)).rolling(14).mean()
loss = (-delta.where(delta < 0, 0)).rolling(14).mean()
rs = gain / loss
features['rsi_14'] = 100 - (100 / (1 + rs))
return features
Categor\u00eda: Medias M\u00f3viles (7)
| Feature |
F\u00f3rmula |
Rango |
Descripci\u00f3n |
sma_10 |
close.rolling(10).mean() |
[0, ∞) |
Simple Moving Average 10 |
sma_20 |
close.rolling(20).mean() |
[0, ∞) |
Simple Moving Average 20 |
sma_50 |
close.rolling(50).mean() |
[0, ∞) |
Simple Moving Average 50 |
sma_ratio_10 |
close / sma_10 |
[0, ∞) |
Ratio precio/SMA10 |
sma_ratio_20 |
close / sma_20 |
[0, ∞) |
Ratio precio/SMA20 |
sma_ratio_50 |
close / sma_50 |
[0, ∞) |
Ratio precio/SMA50 |
sma_slope_20 |
sma_20.diff(5) / 5 |
(-∞, ∞) |
Pendiente de SMA20 |
def calculate_ma_features(df):
features = {}
for period in [10, 20, 50]:
features[f'sma_{period}'] = df['close'].rolling(period).mean()
features[f'sma_ratio_{period}'] = df['close'] / features[f'sma_{period}']
features['sma_slope_20'] = features['sma_20'].diff(5) / 5
return features
Features AMD (25)
Categor\u00eda: Price Action (10)
| Feature |
C\u00e1lculo |
Rango |
Uso |
range_ratio |
(high - low) / high.rolling(20).mean() |
[0, ∞) |
Compresi\u00f3n de rango |
range_ma |
(high - low).rolling(20).mean() |
[0, ∞) |
Promedio de rango |
hl_range_pct |
(high - low) / close |
[0, 1] |
Rango como % de precio |
body_size |
abs(close - open) / (high - low) |
[0, 1] |
Tama\u00f1o del cuerpo |
upper_wick |
(high - max(close, open)) / (high - low) |
[0, 1] |
Mecha superior |
lower_wick |
(min(close, open) - low) / (high - low) |
[0, 1] |
Mecha inferior |
buying_pressure |
(close - low) / (high - low) |
[0, 1] |
Presi\u00f3n compradora |
selling_pressure |
(high - close) / (high - low) |
[0, 1] |
Presi\u00f3n vendedora |
close_position |
(close - low) / (high - low) |
[0, 1] |
Posici\u00f3n del cierre |
range_expansion |
(high - low) / (high - low).shift(1) |
[0, ∞) |
Expansi\u00f3n de rango |
Categor\u00eda: Volumen (8)
| Feature |
C\u00e1lculo |
Descripci\u00f3n |
volume_ratio |
volume / volume.rolling(20).mean() |
Volumen vs promedio |
volume_trend |
volume.rolling(10).mean() - volume.rolling(30).mean() |
Tendencia de volumen |
volume_ma |
volume.rolling(20).mean() |
Volumen promedio |
volume_spike_count |
(volume > volume_ma * 2).rolling(30).sum() |
Spikes recientes |
obv |
Ver c\u00e1lculo OBV |
On-Balance Volume |
obv_slope |
obv.diff(5) / 5 |
Tendencia de OBV |
vwap_distance |
(close - vwap) / close |
Distancia a VWAP |
volume_on_up |
Ver c\u00e1lculo |
Volumen en subidas |
def calculate_volume_features(df):
features = {}
features['volume_ratio'] = df['volume'] / df['volume'].rolling(20).mean()
features['volume_trend'] = df['volume'].rolling(10).mean() - df['volume'].rolling(30).mean()
features['volume_ma'] = df['volume'].rolling(20).mean()
features['volume_spike_count'] = (df['volume'] > features['volume_ma'] * 2).rolling(30).sum()
# OBV
obv = (df['volume'] * ((df['close'] > df['close'].shift(1)).astype(int) * 2 - 1)).cumsum()
features['obv'] = obv
features['obv_slope'] = obv.diff(5) / 5
# VWAP
vwap = (df['close'] * df['volume']).cumsum() / df['volume'].cumsum()
features['vwap_distance'] = (df['close'] - vwap) / df['close']
return features
Categor\u00eda: Market Structure (7)
| Feature |
C\u00e1lculo |
Uso |
higher_highs_count |
(high > high.shift(1)).rolling(10).sum() |
Cuenta HH |
higher_lows_count |
(low > low.shift(1)).rolling(10).sum() |
Cuenta HL |
lower_highs_count |
(high < high.shift(1)).rolling(10).sum() |
Cuenta LH |
lower_lows_count |
(low < low.shift(1)).rolling(10).sum() |
Cuenta LL |
swing_high_distance |
(swing_high_20 - close) / close |
Distancia a swing high |
swing_low_distance |
(close - swing_low_20) / close |
Distancia a swing low |
market_structure_score |
Ver c\u00e1lculo |
Score de estructura |
def calculate_market_structure_features(df):
features = {}
features['higher_highs_count'] = (df['high'] > df['high'].shift(1)).rolling(10).sum()
features['higher_lows_count'] = (df['low'] > df['low'].shift(1)).rolling(10).sum()
features['lower_highs_count'] = (df['high'] < df['high'].shift(1)).rolling(10).sum()
features['lower_lows_count'] = (df['low'] < df['low'].shift(1)).rolling(10).sum()
swing_high = df['high'].rolling(20).max()
swing_low = df['low'].rolling(20).min()
features['swing_high_distance'] = (swing_high - df['close']) / df['close']
features['swing_low_distance'] = (df['close'] - swing_low) / df['close']
# Market structure score (-1 bearish, +1 bullish)
bullish_score = (features['higher_highs_count'] + features['higher_lows_count']) / 20
bearish_score = (features['lower_highs_count'] + features['lower_lows_count']) / 20
features['market_structure_score'] = bullish_score - bearish_score
return features
Features ICT (15)
Categor\u00eda: OTE & Fibonacci (5)
| Feature |
C\u00e1lculo |
Rango |
Descripci\u00f3n |
ote_position |
(close - swing_low) / (swing_high - swing_low) |
[0, 1] |
Posici\u00f3n en rango |
in_discount_zone |
1 if ote_position < 0.38 else 0 |
{0, 1} |
En zona discount |
in_premium_zone |
1 if ote_position > 0.62 else 0 |
{0, 1} |
En zona premium |
in_ote_buy_zone |
1 if 0.62 <= ote_position <= 0.79 else 0 |
{0, 1} |
En OTE compra |
fib_distance_50 |
abs(ote_position - 0.5) |
[0, 0.5] |
Distancia a equilibrio |
Categor\u00eda: Killzones & Timing (5)
| Feature |
C\u00e1lculo |
Descripci\u00f3n |
is_london_kz |
Basado en hora EST |
Killzone London |
is_ny_kz |
Basado en hora EST |
Killzone NY |
is_asian_kz |
Basado en hora EST |
Killzone Asian |
session_strength |
0-1 seg\u00fan killzone |
Fuerza de sesi\u00f3n |
session_overlap |
Detecci\u00f3n de overlap |
Overlap London/NY |
def calculate_ict_features(df):
features = {}
# OTE position
swing_high = df['high'].rolling(50).max()
swing_low = df['low'].rolling(50).min()
range_size = swing_high - swing_low
features['ote_position'] = (df['close'] - swing_low) / (range_size + 1e-8)
features['in_discount_zone'] = (features['ote_position'] < 0.38).astype(int)
features['in_premium_zone'] = (features['ote_position'] > 0.62).astype(int)
features['in_ote_buy_zone'] = (
(features['ote_position'] >= 0.62) & (features['ote_position'] <= 0.79)
).astype(int)
features['fib_distance_50'] = np.abs(features['ote_position'] - 0.5)
# Killzones
hour_est = df.index.tz_convert('America/New_York').hour
features['is_london_kz'] = ((hour_est >= 2) & (hour_est < 5)).astype(int)
features['is_ny_kz'] = ((hour_est >= 8) & (hour_est < 11)).astype(int)
features['is_asian_kz'] = ((hour_est >= 20) | (hour_est < 0)).astype(int)
# Session strength
features['session_strength'] = 0.1 # default
features.loc[features['is_london_kz'] == 1, 'session_strength'] = 0.9
features.loc[features['is_ny_kz'] == 1, 'session_strength'] = 1.0
features.loc[features['is_asian_kz'] == 1, 'session_strength'] = 0.3
# Session overlap
features['session_overlap'] = (
(hour_est >= 10) & (hour_est < 12)
).astype(int) # London close + NY open
return features
Categor\u00eda: Ranges (5)
| Feature |
C\u00e1lculo |
Descripci\u00f3n |
weekly_range_position |
Posici\u00f3n en rango semanal |
0-1 |
daily_range_position |
Posici\u00f3n en rango diario |
0-1 |
weekly_range_size |
High - Low semanal |
Absoluto |
daily_range_size |
High - Low diario |
Absoluto |
range_expansion_daily |
Ratio range actual/promedio |
>1 = expansi\u00f3n |
Features SMC (12)
Categor\u00eda: Structure Breaks (6)
| Feature |
C\u00e1lculo |
Uso |
choch_bullish_count |
Count en ventana 30 |
CHOCHs alcistas |
choch_bearish_count |
Count en ventana 30 |
CHOCHs bajistas |
bos_bullish_count |
Count en ventana 30 |
BOS alcistas |
bos_bearish_count |
Count en ventana 30 |
BOS bajistas |
choch_recency |
Bars desde \u00faltimo CHOCH |
0 = muy reciente |
bos_recency |
Bars desde \u00faltimo BOS |
0 = muy reciente |
def calculate_smc_features(df):
features = {}
# Detectar CHOCHs y BOS
choch_signals = detect_choch(df, window=20)
bos_signals = detect_bos(df, window=20)
# Contar por tipo
features['choch_bullish_count'] = count_signals_in_window(
choch_signals, 'bullish_choch', window=30
)
features['choch_bearish_count'] = count_signals_in_window(
choch_signals, 'bearish_choch', window=30
)
features['bos_bullish_count'] = count_signals_in_window(
bos_signals, 'bullish_bos', window=30
)
features['bos_bearish_count'] = count_signals_in_window(
bos_signals, 'bearish_bos', window=30
)
# Recency
features['choch_recency'] = bars_since_last_signal(choch_signals)
features['bos_recency'] = bars_since_last_signal(bos_signals)
return features
Categor\u00eda: Displacement & Flow (6)
| Feature |
C\u00e1lculo |
Descripci\u00f3n |
displacement_strength |
Movimiento / ATR |
Fuerza de displacement |
displacement_direction |
1=alcista, -1=bajista, 0=neutral |
Direcci\u00f3n |
displacement_recency |
Bars desde \u00faltimo |
Recencia |
inducement_count |
Count en ventana 20 |
Inducements detectados |
inducement_bullish |
Count bullish inducements |
Trampas alcistas |
inducement_bearish |
Count bearish inducements |
Trampas bajistas |
Features de Liquidez (10)
| Feature |
C\u00e1lculo |
Rango |
Descripci\u00f3n |
bsl_distance |
(bsl_level - close) / close |
[0, ∞) |
Distancia a BSL |
ssl_distance |
(close - ssl_level) / close |
[0, ∞) |
Distancia a SSL |
bsl_density |
Count de BSL levels cercanos |
[0, ∞) |
Densidad de BSL |
ssl_density |
Count de SSL levels cercanos |
[0, ∞) |
Densidad de SSL |
bsl_strength |
Volumen en BSL level |
[0, ∞) |
Fuerza del BSL |
ssl_strength |
Volumen en SSL level |
[0, ∞) |
Fuerza del SSL |
liquidity_grab_count |
Count sweeps recientes |
[0, ∞) |
Sweeps recientes |
bsl_sweep_recent |
1 si sweep reciente |
{0, 1} |
BSL swept |
ssl_sweep_recent |
1 si sweep reciente |
{0, 1} |
SSL swept |
near_liquidity |
1 si <1% de level |
{0, 1} |
Cerca de liquidez |
def calculate_liquidity_features(df, lookback=20):
features = {}
# Identificar liquidity pools
swing_highs = df['high'].rolling(lookback, center=True).max()
swing_lows = df['low'].rolling(lookback, center=True).min()
# BSL (Buy Side Liquidity)
bsl_levels = find_liquidity_levels(df, 'high', lookback)
features['bsl_distance'] = (bsl_levels['nearest'] - df['close']) / df['close']
features['bsl_density'] = bsl_levels['density']
features['bsl_strength'] = bsl_levels['strength']
# SSL (Sell Side Liquidity)
ssl_levels = find_liquidity_levels(df, 'low', lookback)
features['ssl_distance'] = (df['close'] - ssl_levels['nearest']) / df['close']
features['ssl_density'] = ssl_levels['density']
features['ssl_strength'] = ssl_levels['strength']
# Sweeps
sweeps = detect_liquidity_sweeps(df, window=30)
features['liquidity_grab_count'] = len(sweeps)
features['bsl_sweep_recent'] = any(s['type'] == 'bsl' for s in sweeps[-5:])
features['ssl_sweep_recent'] = any(s['type'] == 'ssl' for s in sweeps[-5:])
# Proximity
features['near_liquidity'] = (
(features['bsl_distance'] < 0.01) | (features['ssl_distance'] < 0.01)
).astype(int)
return features
Features de Microestructura (8)
Nota: Requiere datos de volumen granular o tick data
| Feature |
C\u00e1lculo |
Descripci\u00f3n |
volume_delta |
buy_volume - sell_volume |
Delta de volumen |
cumulative_volume_delta |
CVD acumulado |
CVD |
cvd_slope |
cvd.diff(5) / 5 |
Tendencia CVD |
tick_imbalance |
(upticks - downticks) / total_ticks |
Imbalance de ticks |
large_orders_count |
Count de \u00f3rdenes grandes |
Actividad institucional |
order_flow_imbalance |
Ratio buy/sell |
-1 a +1 |
poc_distance |
Distancia a Point of Control |
Distancia a POC |
hvn_proximity |
Distancia a High Volume Node |
Zona de alto volumen |
def calculate_microstructure_features(df):
"""
Requiere datos extendidos: buy_volume, sell_volume, tick_data
"""
features = {}
if 'buy_volume' in df.columns and 'sell_volume' in df.columns:
features['volume_delta'] = df['buy_volume'] - df['sell_volume']
features['cumulative_volume_delta'] = features['volume_delta'].cumsum()
features['cvd_slope'] = features['cumulative_volume_delta'].diff(5) / 5
total_volume = df['buy_volume'] + df['sell_volume']
features['order_flow_imbalance'] = features['volume_delta'] / (total_volume + 1e-8)
# Large orders
threshold = df['volume'].rolling(20).mean() * 2
features['large_orders_count'] = (df['volume'] > threshold).rolling(30).sum()
# Volume profile
volume_profile = calculate_volume_profile(df, bins=50)
features['poc_distance'] = (df['close'] - volume_profile['poc']) / df['close']
return features
Targets para Modelos
Target 1: AMD Phase (AMDDetector)
TARGET_AMD_PHASE = {
0: 'neutral',
1: 'accumulation',
2: 'manipulation',
3: 'distribution'
}
def label_amd_phase(df, i, forward_window=20):
"""
Ver documentaci\u00f3n ESTRATEGIA-AMD-COMPLETA.md
"""
# Implementaci\u00f3n completa en documento AMD
pass
Target 2: Delta High/Low (RangePredictor)
# Targets de regresi\u00f3n
TARGETS_RANGE = {
'delta_high_15m': float, # Predicci\u00f3n continua
'delta_low_15m': float,
'delta_high_1h': float,
'delta_low_1h': float,
# Targets de clasificaci\u00f3n (bins)
'bin_high_15m': int, # 0-3
'bin_low_15m': int,
'bin_high_1h': int,
'bin_low_1h': int
}
def calculate_range_targets(df, horizons={'15m': 3, '1h': 12}):
targets = {}
atr = calculate_atr(df, 14)
for name, periods in horizons.items():
# Delta high
targets[f'delta_high_{name}'] = (
df['high'].rolling(periods).max().shift(-periods) - df['close']
) / df['close']
# Delta low
targets[f'delta_low_{name}'] = (
df['close'] - df['low'].rolling(periods).min().shift(-periods)
) / df['close']
# Bins (volatilidad normalizada por ATR)
def to_bin(delta_series):
ratio = delta_series / atr
bins = pd.cut(
ratio,
bins=[-np.inf, 0.3, 0.7, 1.2, np.inf],
labels=[0, 1, 2, 3]
)
return bins.astype(float)
targets[f'bin_high_{name}'] = to_bin(targets[f'delta_high_{name}'])
targets[f'bin_low_{name}'] = to_bin(targets[f'delta_low_{name}'])
return pd.DataFrame(targets)
Target 3: TP vs SL (TPSLClassifier)
TARGETS_TPSL = {
'tp_first_15m_rr_2_1': int, # 0 o 1
'tp_first_15m_rr_3_1': int,
'tp_first_1h_rr_2_1': int,
'tp_first_1h_rr_3_1': int
}
def calculate_tpsl_targets(df, rr_configs):
"""
Simula si TP se alcanza antes que SL
"""
targets = {}
atr = calculate_atr(df, 14)
for rr in rr_configs:
sl_dist = atr * rr['sl_atr_multiple']
tp_dist = atr * rr['tp_atr_multiple']
def check_tp_first(i, horizon_bars):
if i + horizon_bars >= len(df):
return np.nan
entry_price = df['close'].iloc[i]
sl_price = entry_price - sl_dist.iloc[i]
tp_price = entry_price + tp_dist.iloc[i]
future = df.iloc[i+1:i+horizon_bars+1]
for _, row in future.iterrows():
if row['low'] <= sl_price:
return 0 # SL hit first
elif row['high'] >= tp_price:
return 1 # TP hit first
return np.nan # Neither hit
for horizon_name, horizon_bars in [('15m', 3), ('1h', 12)]:
target_name = f'tp_first_{horizon_name}_{rr["name"]}'
targets[target_name] = [
check_tp_first(i, horizon_bars) for i in range(len(df))
]
return pd.DataFrame(targets)
Target 4: Liquidity Sweep (LiquidityHunter)
TARGETS_LIQUIDITY = {
'bsl_sweep': int, # 0 o 1
'ssl_sweep': int,
'any_sweep': int,
'sweep_timing': int # Bars hasta sweep
}
def label_liquidity_sweep(df, i, forward_window=10):
"""
Etiqueta si habr\u00e1 liquidity sweep
"""
if i + forward_window >= len(df):
return {'bsl_sweep': np.nan, 'ssl_sweep': np.nan}
swing_high = df['high'].iloc[max(0, i-20):i].max()
swing_low = df['low'].iloc[max(0, i-20):i].min()
future = df.iloc[i:i+forward_window]
# BSL sweep (sweep of highs)
bsl_swept = (future['high'] >= swing_high * 1.005).any()
# SSL sweep (sweep of lows)
ssl_swept = (future['low'] <= swing_low * 0.995).any()
# Timing
if bsl_swept:
sweep_timing = (future['high'] >= swing_high * 1.005).idxmax()
elif ssl_swept:
sweep_timing = (future['low'] <= swing_low * 0.995).idxmax()
else:
sweep_timing = np.nan
return {
'bsl_sweep': 1 if bsl_swept else 0,
'ssl_sweep': 1 if ssl_swept else 0,
'any_sweep': 1 if (bsl_swept or ssl_swept) else 0,
'sweep_timing': sweep_timing
}
Target 5: Order Flow (OrderFlowAnalyzer)
TARGETS_ORDER_FLOW = {
'flow_type': int, # 0=neutral, 1=accumulation, 2=distribution
'institutional_activity': float # 0-1 score
}
def label_order_flow(df, i, forward_window=50):
"""
Basado en CVD y large orders
"""
if 'cumulative_volume_delta' not in df.columns:
return {'flow_type': 0}
current_cvd = df['cumulative_volume_delta'].iloc[i]
future_cvd = df['cumulative_volume_delta'].iloc[i+forward_window]
cvd_change = future_cvd - current_cvd
# Large orders in window
large_orders = df['large_orders_count'].iloc[i:i+forward_window].sum()
if cvd_change > 0 and large_orders > 5:
flow_type = 1 # accumulation
elif cvd_change < 0 and large_orders > 5:
flow_type = 2 # distribution
else:
flow_type = 0 # neutral
institutional_activity = min(1.0, large_orders / 10)
return {
'flow_type': flow_type,
'institutional_activity': institutional_activity
}
Feature Engineering Pipeline
Pipeline Completo
class FeatureEngineeringPipeline:
"""
Pipeline completo de feature engineering
"""
def __init__(self, config=None):
self.config = config or {}
self.scalers = {}
def transform(self, df):
"""
Transforma OHLCV raw a features completos
"""
features = pd.DataFrame(index=df.index)
# 1. Base features
print("Extracting base features...")
base = self._extract_base_features(df)
features = pd.concat([features, base], axis=1)
# 2. AMD features
print("Extracting AMD features...")
amd = self._extract_amd_features(df)
features = pd.concat([features, amd], axis=1)
# 3. ICT features
print("Extracting ICT features...")
ict = self._extract_ict_features(df)
features = pd.concat([features, ict], axis=1)
# 4. SMC features
print("Extracting SMC features...")
smc = self._extract_smc_features(df)
features = pd.concat([features, smc], axis=1)
# 5. Liquidity features
print("Extracting liquidity features...")
liquidity = self._extract_liquidity_features(df)
features = pd.concat([features, liquidity], axis=1)
# 6. Microstructure (si disponible)
if 'buy_volume' in df.columns:
print("Extracting microstructure features...")
micro = self._extract_microstructure_features(df)
features = pd.concat([features, micro], axis=1)
# 7. Scaling
print("Scaling features...")
features_scaled = self._scale_features(features)
# 8. Handle missing values
features_scaled = features_scaled.fillna(method='ffill').fillna(0)
return features_scaled
def _extract_base_features(self, df):
"""Extrae features base (21)"""
features = {}
# Volatilidad
features.update(calculate_volatility_features(df))
# Momentum
features.update(calculate_momentum_features(df))
# Moving averages
features.update(calculate_ma_features(df))
return pd.DataFrame(features)
def _scale_features(self, features):
"""Escala features usando RobustScaler"""
from sklearn.preprocessing import RobustScaler
if not self.scalers:
# Fit scalers
for col in features.columns:
self.scalers[col] = RobustScaler()
features[col] = self.scalers[col].fit_transform(
features[col].values.reshape(-1, 1)
)
else:
# Transform with fitted scalers
for col in features.columns:
if col in self.scalers:
features[col] = self.scalers[col].transform(
features[col].values.reshape(-1, 1)
)
return features
Uso del Pipeline
# Inicializar
pipeline = FeatureEngineeringPipeline()
# Transformar datos
df_raw = load_ohlcv_data('BTCUSDT', '5m')
features = pipeline.transform(df_raw)
print(f"Features shape: {features.shape}")
print(f"Features: {features.columns.tolist()}")
# Features ready for ML models
X = features.values
Consideraciones T\u00e9cnicas
1. Prevenci\u00f3n de Look-Ahead Bias
IMPORTANTE: Nunca usar datos futuros para calcular features
# ✅ CORRECTO
sma_20 = df['close'].rolling(20).mean()
# ❌ INCORRECTO
sma_20 = df['close'].rolling(20, center=True).mean() # Usa datos futuros!
2. Handling Missing Values
def handle_missing(features):
"""
Estrategia de imputaci\u00f3n
"""
# 1. Forward fill (usar \u00faltimo valor conocido)
features = features.fillna(method='ffill')
# 2. Si a\u00fan hay NaNs al inicio, usar 0
features = features.fillna(0)
# 3. Alternativa: usar median
# features = features.fillna(features.median())
return features
3. Feature Scaling
from sklearn.preprocessing import RobustScaler, StandardScaler, MinMaxScaler
# Price-based features → RobustScaler (maneja outliers)
price_scaler = RobustScaler()
# Indicators → StandardScaler
indicator_scaler = StandardScaler()
# Ratios/percentages → MinMaxScaler
ratio_scaler = MinMaxScaler(feature_range=(0, 1))
4. Feature Selection
def select_important_features(X, y, model, top_n=50):
"""
Selecciona features m\u00e1s importantes
"""
# Train model
model.fit(X, y)
# Get importance
importance = pd.DataFrame({
'feature': feature_names,
'importance': model.feature_importances_
}).sort_values('importance', ascending=False)
# Select top N
selected_features = importance.head(top_n)['feature'].tolist()
return selected_features
5. Validaci\u00f3n Temporal
def temporal_validation_split(df, train_pct=0.7, val_pct=0.15):
"""
Split temporal estricto (sin shuffle)
"""
n = len(df)
train_end = int(n * train_pct)
val_end = int(n * (train_pct + val_pct))
df_train = df.iloc[:train_end]
df_val = df.iloc[train_end:val_end]
df_test = df.iloc[val_end:]
# Verificar no hay overlap
assert df_train.index[-1] < df_val.index[0]
assert df_val.index[-1] < df_test.index[0]
return df_train, df_val, df_test
Resumen de Dimensiones
| Categor\u00eda |
Features |
Modelos |
| Base T\u00e9cnicos |
21 |
Todos |
| AMD |
25 |
AMD, Range, TPSL |
| ICT |
15 |
Range, TPSL |
| SMC |
12 |
Range, TPSL |
| Liquidez |
10 |
Liquidity, TPSL |
| Microestructura |
8 |
OrderFlow |
| TOTAL |
91 features |
- |
Documento Generado: 2025-12-05
Pr\u00f3xima Revisi\u00f3n: 2025-Q1
Contacto: ml-engineering@trading.ai