trading-platform/docs/02-definicion-modulos/OQI-006-ml-signals/estrategias/FEATURES-TARGETS-ML.md
rckrdmrd c1b5081208 feat(ml): Complete FASE 11 - BTCUSD update and comprehensive documentation alignment
ML Engine Updates:
- Updated BTCUSD with Polygon API data (2024-2025): 215,699 new records
- Re-trained all ML models: Attention (R²: 0.223), Base, Metamodel (87.3% confidence)
- Backtest results: +176.71R profit with aggressive_filter strategy

Documentation Consolidation:
- Created docs/99-analisis/_MAP.md index with 13 new analysis documents
- Consolidated inventories: removed duplicates from orchestration/inventarios/
- Updated ML_INVENTORY.yml with BTCUSD metrics and training results
- Added execution reports: FASE11-BTCUSD, correction issues, alignment validation

Architecture & Integration:
- Updated all module documentation with NEXUS v3.4 frontmatter
- Fixed _MAP.md indexes across all folders
- Updated orchestration plans and traces

Files: 229 changed, 5064 insertions(+), 1872 deletions(-)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 09:31:29 -06:00

879 lines
29 KiB
Markdown

---
id: "FEATURES-TARGETS-ML"
title: "Cat\u00e1logo de Features y Targets - Machine Learning"
type: "Documentation"
project: "trading-platform"
version: "1.0.0"
updated_date: "2026-01-04"
---
# Cat\u00e1logo de Features y Targets - Machine Learning
**Versi\u00f3n:** 1.0.0
**Fecha:** 2025-12-05
**M\u00f3dulo:** OQI-006-ml-signals
**Autor:** Trading Strategist - Trading Platform
---
## Tabla de Contenidos
1. [Introducci\u00f3n](#introducci\u00f3n)
2. [Features Base (21)](#features-base-21)
3. [Features AMD (25)](#features-amd-25)
4. [Features ICT (15)](#features-ict-15)
5. [Features SMC (12)](#features-smc-12)
6. [Features de Liquidez (10)](#features-de-liquidez-10)
7. [Features de Microestructura (8)](#features-de-microestructura-8)
8. [Targets para Modelos](#targets-para-modelos)
9. [Feature Engineering Pipeline](#feature-engineering-pipeline)
10. [Consideraciones T\u00e9cnicas](#consideraciones-t\u00e9cnicas)
---
## Introducci\u00f3n
Este documento define el cat\u00e1logo completo de features (variables de entrada) y targets (variables objetivo) utilizados en los modelos ML de Trading Platform.
### Dimensiones Totales
| Categor\u00eda | Features | Modelos que las usan |
|-----------|----------|---------------------|
| **Base T\u00e9cnicos** | 21 | Todos |
| **AMD** | 25 | AMDDetector, Range, TPSL |
| **ICT** | 15 | Range, TPSL, Orchestrator |
| **SMC** | 12 | Range, TPSL, Orchestrator |
| **Liquidez** | 10 | LiquidityHunter, TPSL |
| **Microestructura** | 8 | OrderFlow (opcional) |
| **Total Base** | ~91 features | - |
---
## Features Base (21)
### Categor\u00eda: Volatilidad (8)
| Feature | F\u00f3rmula | Rango | Descripci\u00f3n |
|---------|---------|-------|---------------|
| `volatility_5` | `close.pct_change().rolling(5).std()` | [0, ∞) | Volatilidad 5 periodos |
| `volatility_10` | `close.pct_change().rolling(10).std()` | [0, ∞) | Volatilidad 10 periodos |
| `volatility_20` | `close.pct_change().rolling(20).std()` | [0, ∞) | Volatilidad 20 periodos |
| `volatility_50` | `close.pct_change().rolling(50).std()` | [0, ∞) | Volatilidad 50 periodos |
| `atr_5` | `TrueRange.rolling(5).mean()` | [0, ∞) | Average True Range 5p |
| `atr_10` | `TrueRange.rolling(10).mean()` | [0, ∞) | Average True Range 10p |
| `atr_14` | `TrueRange.rolling(14).mean()` | [0, ∞) | Average True Range 14p (est\u00e1ndar) |
| `atr_ratio` | `atr_14 / atr_14.rolling(50).mean()` | [0, ∞) | Ratio ATR actual vs promedio |
```python
def calculate_volatility_features(df):
features = {}
for period in [5, 10, 20, 50]:
features[f'volatility_{period}'] = df['close'].pct_change().rolling(period).std()
# ATR
high_low = df['high'] - df['low']
high_close = np.abs(df['high'] - df['close'].shift())
low_close = np.abs(df['low'] - df['close'].shift())
true_range = pd.concat([high_low, high_close, low_close], axis=1).max(axis=1)
for period in [5, 10, 14]:
features[f'atr_{period}'] = true_range.rolling(period).mean()
features['atr_ratio'] = features['atr_14'] / features['atr_14'].rolling(50).mean()
return features
```
### Categor\u00eda: Momentum (6)
| Feature | F\u00f3rmula | Rango | Descripci\u00f3n |
|---------|---------|-------|---------------|
| `momentum_5` | `close - close.shift(5)` | (-∞, ∞) | Momentum 5 periodos |
| `momentum_10` | `close - close.shift(10)` | (-∞, ∞) | Momentum 10 periodos |
| `momentum_20` | `close - close.shift(20)` | (-∞, ∞) | Momentum 20 periodos |
| `roc_5` | `(close / close.shift(5) - 1) * 100` | (-100, ∞) | Rate of Change 5p |
| `roc_10` | `(close / close.shift(10) - 1) * 100` | (-100, ∞) | Rate of Change 10p |
| `rsi_14` | Ver f\u00f3rmula RSI | [0, 100] | Relative Strength Index |
```python
def calculate_momentum_features(df):
features = {}
# Momentum
for period in [5, 10, 20]:
features[f'momentum_{period}'] = df['close'] - df['close'].shift(period)
features[f'roc_{period}'] = (df['close'] / df['close'].shift(period) - 1) * 100
# RSI
delta = df['close'].diff()
gain = (delta.where(delta > 0, 0)).rolling(14).mean()
loss = (-delta.where(delta < 0, 0)).rolling(14).mean()
rs = gain / loss
features['rsi_14'] = 100 - (100 / (1 + rs))
return features
```
### Categor\u00eda: Medias M\u00f3viles (7)
| Feature | F\u00f3rmula | Rango | Descripci\u00f3n |
|---------|---------|-------|---------------|
| `sma_10` | `close.rolling(10).mean()` | [0, ∞) | Simple Moving Average 10 |
| `sma_20` | `close.rolling(20).mean()` | [0, ∞) | Simple Moving Average 20 |
| `sma_50` | `close.rolling(50).mean()` | [0, ∞) | Simple Moving Average 50 |
| `sma_ratio_10` | `close / sma_10` | [0, ∞) | Ratio precio/SMA10 |
| `sma_ratio_20` | `close / sma_20` | [0, ∞) | Ratio precio/SMA20 |
| `sma_ratio_50` | `close / sma_50` | [0, ∞) | Ratio precio/SMA50 |
| `sma_slope_20` | `sma_20.diff(5) / 5` | (-∞, ∞) | Pendiente de SMA20 |
```python
def calculate_ma_features(df):
features = {}
for period in [10, 20, 50]:
features[f'sma_{period}'] = df['close'].rolling(period).mean()
features[f'sma_ratio_{period}'] = df['close'] / features[f'sma_{period}']
features['sma_slope_20'] = features['sma_20'].diff(5) / 5
return features
```
---
## Features AMD (25)
### Categor\u00eda: Price Action (10)
| Feature | C\u00e1lculo | Rango | Uso |
|---------|---------|-------|-----|
| `range_ratio` | `(high - low) / high.rolling(20).mean()` | [0, ∞) | Compresi\u00f3n de rango |
| `range_ma` | `(high - low).rolling(20).mean()` | [0, ∞) | Promedio de rango |
| `hl_range_pct` | `(high - low) / close` | [0, 1] | Rango como % de precio |
| `body_size` | `abs(close - open) / (high - low)` | [0, 1] | Tama\u00f1o del cuerpo |
| `upper_wick` | `(high - max(close, open)) / (high - low)` | [0, 1] | Mecha superior |
| `lower_wick` | `(min(close, open) - low) / (high - low)` | [0, 1] | Mecha inferior |
| `buying_pressure` | `(close - low) / (high - low)` | [0, 1] | Presi\u00f3n compradora |
| `selling_pressure` | `(high - close) / (high - low)` | [0, 1] | Presi\u00f3n vendedora |
| `close_position` | `(close - low) / (high - low)` | [0, 1] | Posici\u00f3n del cierre |
| `range_expansion` | `(high - low) / (high - low).shift(1)` | [0, ∞) | Expansi\u00f3n de rango |
### Categor\u00eda: Volumen (8)
| Feature | C\u00e1lculo | Descripci\u00f3n |
|---------|---------|---------------|
| `volume_ratio` | `volume / volume.rolling(20).mean()` | Volumen vs promedio |
| `volume_trend` | `volume.rolling(10).mean() - volume.rolling(30).mean()` | Tendencia de volumen |
| `volume_ma` | `volume.rolling(20).mean()` | Volumen promedio |
| `volume_spike_count` | `(volume > volume_ma * 2).rolling(30).sum()` | Spikes recientes |
| `obv` | Ver c\u00e1lculo OBV | On-Balance Volume |
| `obv_slope` | `obv.diff(5) / 5` | Tendencia de OBV |
| `vwap_distance` | `(close - vwap) / close` | Distancia a VWAP |
| `volume_on_up` | Ver c\u00e1lculo | Volumen en subidas |
```python
def calculate_volume_features(df):
features = {}
features['volume_ratio'] = df['volume'] / df['volume'].rolling(20).mean()
features['volume_trend'] = df['volume'].rolling(10).mean() - df['volume'].rolling(30).mean()
features['volume_ma'] = df['volume'].rolling(20).mean()
features['volume_spike_count'] = (df['volume'] > features['volume_ma'] * 2).rolling(30).sum()
# OBV
obv = (df['volume'] * ((df['close'] > df['close'].shift(1)).astype(int) * 2 - 1)).cumsum()
features['obv'] = obv
features['obv_slope'] = obv.diff(5) / 5
# VWAP
vwap = (df['close'] * df['volume']).cumsum() / df['volume'].cumsum()
features['vwap_distance'] = (df['close'] - vwap) / df['close']
return features
```
### Categor\u00eda: Market Structure (7)
| Feature | C\u00e1lculo | Uso |
|---------|---------|-----|
| `higher_highs_count` | `(high > high.shift(1)).rolling(10).sum()` | Cuenta HH |
| `higher_lows_count` | `(low > low.shift(1)).rolling(10).sum()` | Cuenta HL |
| `lower_highs_count` | `(high < high.shift(1)).rolling(10).sum()` | Cuenta LH |
| `lower_lows_count` | `(low < low.shift(1)).rolling(10).sum()` | Cuenta LL |
| `swing_high_distance` | `(swing_high_20 - close) / close` | Distancia a swing high |
| `swing_low_distance` | `(close - swing_low_20) / close` | Distancia a swing low |
| `market_structure_score` | Ver c\u00e1lculo | Score de estructura |
```python
def calculate_market_structure_features(df):
features = {}
features['higher_highs_count'] = (df['high'] > df['high'].shift(1)).rolling(10).sum()
features['higher_lows_count'] = (df['low'] > df['low'].shift(1)).rolling(10).sum()
features['lower_highs_count'] = (df['high'] < df['high'].shift(1)).rolling(10).sum()
features['lower_lows_count'] = (df['low'] < df['low'].shift(1)).rolling(10).sum()
swing_high = df['high'].rolling(20).max()
swing_low = df['low'].rolling(20).min()
features['swing_high_distance'] = (swing_high - df['close']) / df['close']
features['swing_low_distance'] = (df['close'] - swing_low) / df['close']
# Market structure score (-1 bearish, +1 bullish)
bullish_score = (features['higher_highs_count'] + features['higher_lows_count']) / 20
bearish_score = (features['lower_highs_count'] + features['lower_lows_count']) / 20
features['market_structure_score'] = bullish_score - bearish_score
return features
```
---
## Features ICT (15)
### Categor\u00eda: OTE & Fibonacci (5)
| Feature | C\u00e1lculo | Rango | Descripci\u00f3n |
|---------|---------|-------|---------------|
| `ote_position` | `(close - swing_low) / (swing_high - swing_low)` | [0, 1] | Posici\u00f3n en rango |
| `in_discount_zone` | `1 if ote_position < 0.38 else 0` | {0, 1} | En zona discount |
| `in_premium_zone` | `1 if ote_position > 0.62 else 0` | {0, 1} | En zona premium |
| `in_ote_buy_zone` | `1 if 0.62 <= ote_position <= 0.79 else 0` | {0, 1} | En OTE compra |
| `fib_distance_50` | `abs(ote_position - 0.5)` | [0, 0.5] | Distancia a equilibrio |
### Categor\u00eda: Killzones & Timing (5)
| Feature | C\u00e1lculo | Descripci\u00f3n |
|---------|---------|---------------|
| `is_london_kz` | Basado en hora EST | Killzone London |
| `is_ny_kz` | Basado en hora EST | Killzone NY |
| `is_asian_kz` | Basado en hora EST | Killzone Asian |
| `session_strength` | 0-1 seg\u00fan killzone | Fuerza de sesi\u00f3n |
| `session_overlap` | Detecci\u00f3n de overlap | Overlap London/NY |
```python
def calculate_ict_features(df):
features = {}
# OTE position
swing_high = df['high'].rolling(50).max()
swing_low = df['low'].rolling(50).min()
range_size = swing_high - swing_low
features['ote_position'] = (df['close'] - swing_low) / (range_size + 1e-8)
features['in_discount_zone'] = (features['ote_position'] < 0.38).astype(int)
features['in_premium_zone'] = (features['ote_position'] > 0.62).astype(int)
features['in_ote_buy_zone'] = (
(features['ote_position'] >= 0.62) & (features['ote_position'] <= 0.79)
).astype(int)
features['fib_distance_50'] = np.abs(features['ote_position'] - 0.5)
# Killzones
hour_est = df.index.tz_convert('America/New_York').hour
features['is_london_kz'] = ((hour_est >= 2) & (hour_est < 5)).astype(int)
features['is_ny_kz'] = ((hour_est >= 8) & (hour_est < 11)).astype(int)
features['is_asian_kz'] = ((hour_est >= 20) | (hour_est < 0)).astype(int)
# Session strength
features['session_strength'] = 0.1 # default
features.loc[features['is_london_kz'] == 1, 'session_strength'] = 0.9
features.loc[features['is_ny_kz'] == 1, 'session_strength'] = 1.0
features.loc[features['is_asian_kz'] == 1, 'session_strength'] = 0.3
# Session overlap
features['session_overlap'] = (
(hour_est >= 10) & (hour_est < 12)
).astype(int) # London close + NY open
return features
```
### Categor\u00eda: Ranges (5)
| Feature | C\u00e1lculo | Descripci\u00f3n |
|---------|---------|---------------|
| `weekly_range_position` | Posici\u00f3n en rango semanal | 0-1 |
| `daily_range_position` | Posici\u00f3n en rango diario | 0-1 |
| `weekly_range_size` | High - Low semanal | Absoluto |
| `daily_range_size` | High - Low diario | Absoluto |
| `range_expansion_daily` | Ratio range actual/promedio | >1 = expansi\u00f3n |
---
## Features SMC (12)
### Categor\u00eda: Structure Breaks (6)
| Feature | C\u00e1lculo | Uso |
|---------|---------|-----|
| `choch_bullish_count` | Count en ventana 30 | CHOCHs alcistas |
| `choch_bearish_count` | Count en ventana 30 | CHOCHs bajistas |
| `bos_bullish_count` | Count en ventana 30 | BOS alcistas |
| `bos_bearish_count` | Count en ventana 30 | BOS bajistas |
| `choch_recency` | Bars desde \u00faltimo CHOCH | 0 = muy reciente |
| `bos_recency` | Bars desde \u00faltimo BOS | 0 = muy reciente |
```python
def calculate_smc_features(df):
features = {}
# Detectar CHOCHs y BOS
choch_signals = detect_choch(df, window=20)
bos_signals = detect_bos(df, window=20)
# Contar por tipo
features['choch_bullish_count'] = count_signals_in_window(
choch_signals, 'bullish_choch', window=30
)
features['choch_bearish_count'] = count_signals_in_window(
choch_signals, 'bearish_choch', window=30
)
features['bos_bullish_count'] = count_signals_in_window(
bos_signals, 'bullish_bos', window=30
)
features['bos_bearish_count'] = count_signals_in_window(
bos_signals, 'bearish_bos', window=30
)
# Recency
features['choch_recency'] = bars_since_last_signal(choch_signals)
features['bos_recency'] = bars_since_last_signal(bos_signals)
return features
```
### Categor\u00eda: Displacement & Flow (6)
| Feature | C\u00e1lculo | Descripci\u00f3n |
|---------|---------|---------------|
| `displacement_strength` | Movimiento / ATR | Fuerza de displacement |
| `displacement_direction` | 1=alcista, -1=bajista, 0=neutral | Direcci\u00f3n |
| `displacement_recency` | Bars desde \u00faltimo | Recencia |
| `inducement_count` | Count en ventana 20 | Inducements detectados |
| `inducement_bullish` | Count bullish inducements | Trampas alcistas |
| `inducement_bearish` | Count bearish inducements | Trampas bajistas |
---
## Features de Liquidez (10)
| Feature | C\u00e1lculo | Rango | Descripci\u00f3n |
|---------|---------|-------|---------------|
| `bsl_distance` | `(bsl_level - close) / close` | [0, ∞) | Distancia a BSL |
| `ssl_distance` | `(close - ssl_level) / close` | [0, ∞) | Distancia a SSL |
| `bsl_density` | Count de BSL levels cercanos | [0, ∞) | Densidad de BSL |
| `ssl_density` | Count de SSL levels cercanos | [0, ∞) | Densidad de SSL |
| `bsl_strength` | Volumen en BSL level | [0, ∞) | Fuerza del BSL |
| `ssl_strength` | Volumen en SSL level | [0, ∞) | Fuerza del SSL |
| `liquidity_grab_count` | Count sweeps recientes | [0, ∞) | Sweeps recientes |
| `bsl_sweep_recent` | 1 si sweep reciente | {0, 1} | BSL swept |
| `ssl_sweep_recent` | 1 si sweep reciente | {0, 1} | SSL swept |
| `near_liquidity` | 1 si <1% de level | {0, 1} | Cerca de liquidez |
```python
def calculate_liquidity_features(df, lookback=20):
features = {}
# Identificar liquidity pools
swing_highs = df['high'].rolling(lookback, center=True).max()
swing_lows = df['low'].rolling(lookback, center=True).min()
# BSL (Buy Side Liquidity)
bsl_levels = find_liquidity_levels(df, 'high', lookback)
features['bsl_distance'] = (bsl_levels['nearest'] - df['close']) / df['close']
features['bsl_density'] = bsl_levels['density']
features['bsl_strength'] = bsl_levels['strength']
# SSL (Sell Side Liquidity)
ssl_levels = find_liquidity_levels(df, 'low', lookback)
features['ssl_distance'] = (df['close'] - ssl_levels['nearest']) / df['close']
features['ssl_density'] = ssl_levels['density']
features['ssl_strength'] = ssl_levels['strength']
# Sweeps
sweeps = detect_liquidity_sweeps(df, window=30)
features['liquidity_grab_count'] = len(sweeps)
features['bsl_sweep_recent'] = any(s['type'] == 'bsl' for s in sweeps[-5:])
features['ssl_sweep_recent'] = any(s['type'] == 'ssl' for s in sweeps[-5:])
# Proximity
features['near_liquidity'] = (
(features['bsl_distance'] < 0.01) | (features['ssl_distance'] < 0.01)
).astype(int)
return features
```
---
## Features de Microestructura (8)
**Nota:** Requiere datos de volumen granular o tick data
| Feature | C\u00e1lculo | Descripci\u00f3n |
|---------|---------|---------------|
| `volume_delta` | `buy_volume - sell_volume` | Delta de volumen |
| `cumulative_volume_delta` | CVD acumulado | CVD |
| `cvd_slope` | `cvd.diff(5) / 5` | Tendencia CVD |
| `tick_imbalance` | `(upticks - downticks) / total_ticks` | Imbalance de ticks |
| `large_orders_count` | Count de \u00f3rdenes grandes | Actividad institucional |
| `order_flow_imbalance` | Ratio buy/sell | -1 a +1 |
| `poc_distance` | Distancia a Point of Control | Distancia a POC |
| `hvn_proximity` | Distancia a High Volume Node | Zona de alto volumen |
```python
def calculate_microstructure_features(df):
"""
Requiere datos extendidos: buy_volume, sell_volume, tick_data
"""
features = {}
if 'buy_volume' in df.columns and 'sell_volume' in df.columns:
features['volume_delta'] = df['buy_volume'] - df['sell_volume']
features['cumulative_volume_delta'] = features['volume_delta'].cumsum()
features['cvd_slope'] = features['cumulative_volume_delta'].diff(5) / 5
total_volume = df['buy_volume'] + df['sell_volume']
features['order_flow_imbalance'] = features['volume_delta'] / (total_volume + 1e-8)
# Large orders
threshold = df['volume'].rolling(20).mean() * 2
features['large_orders_count'] = (df['volume'] > threshold).rolling(30).sum()
# Volume profile
volume_profile = calculate_volume_profile(df, bins=50)
features['poc_distance'] = (df['close'] - volume_profile['poc']) / df['close']
return features
```
---
## Targets para Modelos
### Target 1: AMD Phase (AMDDetector)
```python
TARGET_AMD_PHASE = {
0: 'neutral',
1: 'accumulation',
2: 'manipulation',
3: 'distribution'
}
def label_amd_phase(df, i, forward_window=20):
"""
Ver documentaci\u00f3n ESTRATEGIA-AMD-COMPLETA.md
"""
# Implementaci\u00f3n completa en documento AMD
pass
```
### Target 2: Delta High/Low (RangePredictor)
```python
# Targets de regresi\u00f3n
TARGETS_RANGE = {
'delta_high_15m': float, # Predicci\u00f3n continua
'delta_low_15m': float,
'delta_high_1h': float,
'delta_low_1h': float,
# Targets de clasificaci\u00f3n (bins)
'bin_high_15m': int, # 0-3
'bin_low_15m': int,
'bin_high_1h': int,
'bin_low_1h': int
}
def calculate_range_targets(df, horizons={'15m': 3, '1h': 12}):
targets = {}
atr = calculate_atr(df, 14)
for name, periods in horizons.items():
# Delta high
targets[f'delta_high_{name}'] = (
df['high'].rolling(periods).max().shift(-periods) - df['close']
) / df['close']
# Delta low
targets[f'delta_low_{name}'] = (
df['close'] - df['low'].rolling(periods).min().shift(-periods)
) / df['close']
# Bins (volatilidad normalizada por ATR)
def to_bin(delta_series):
ratio = delta_series / atr
bins = pd.cut(
ratio,
bins=[-np.inf, 0.3, 0.7, 1.2, np.inf],
labels=[0, 1, 2, 3]
)
return bins.astype(float)
targets[f'bin_high_{name}'] = to_bin(targets[f'delta_high_{name}'])
targets[f'bin_low_{name}'] = to_bin(targets[f'delta_low_{name}'])
return pd.DataFrame(targets)
```
### Target 3: TP vs SL (TPSLClassifier)
```python
TARGETS_TPSL = {
'tp_first_15m_rr_2_1': int, # 0 o 1
'tp_first_15m_rr_3_1': int,
'tp_first_1h_rr_2_1': int,
'tp_first_1h_rr_3_1': int
}
def calculate_tpsl_targets(df, rr_configs):
"""
Simula si TP se alcanza antes que SL
"""
targets = {}
atr = calculate_atr(df, 14)
for rr in rr_configs:
sl_dist = atr * rr['sl_atr_multiple']
tp_dist = atr * rr['tp_atr_multiple']
def check_tp_first(i, horizon_bars):
if i + horizon_bars >= len(df):
return np.nan
entry_price = df['close'].iloc[i]
sl_price = entry_price - sl_dist.iloc[i]
tp_price = entry_price + tp_dist.iloc[i]
future = df.iloc[i+1:i+horizon_bars+1]
for _, row in future.iterrows():
if row['low'] <= sl_price:
return 0 # SL hit first
elif row['high'] >= tp_price:
return 1 # TP hit first
return np.nan # Neither hit
for horizon_name, horizon_bars in [('15m', 3), ('1h', 12)]:
target_name = f'tp_first_{horizon_name}_{rr["name"]}'
targets[target_name] = [
check_tp_first(i, horizon_bars) for i in range(len(df))
]
return pd.DataFrame(targets)
```
### Target 4: Liquidity Sweep (LiquidityHunter)
```python
TARGETS_LIQUIDITY = {
'bsl_sweep': int, # 0 o 1
'ssl_sweep': int,
'any_sweep': int,
'sweep_timing': int # Bars hasta sweep
}
def label_liquidity_sweep(df, i, forward_window=10):
"""
Etiqueta si habr\u00e1 liquidity sweep
"""
if i + forward_window >= len(df):
return {'bsl_sweep': np.nan, 'ssl_sweep': np.nan}
swing_high = df['high'].iloc[max(0, i-20):i].max()
swing_low = df['low'].iloc[max(0, i-20):i].min()
future = df.iloc[i:i+forward_window]
# BSL sweep (sweep of highs)
bsl_swept = (future['high'] >= swing_high * 1.005).any()
# SSL sweep (sweep of lows)
ssl_swept = (future['low'] <= swing_low * 0.995).any()
# Timing
if bsl_swept:
sweep_timing = (future['high'] >= swing_high * 1.005).idxmax()
elif ssl_swept:
sweep_timing = (future['low'] <= swing_low * 0.995).idxmax()
else:
sweep_timing = np.nan
return {
'bsl_sweep': 1 if bsl_swept else 0,
'ssl_sweep': 1 if ssl_swept else 0,
'any_sweep': 1 if (bsl_swept or ssl_swept) else 0,
'sweep_timing': sweep_timing
}
```
### Target 5: Order Flow (OrderFlowAnalyzer)
```python
TARGETS_ORDER_FLOW = {
'flow_type': int, # 0=neutral, 1=accumulation, 2=distribution
'institutional_activity': float # 0-1 score
}
def label_order_flow(df, i, forward_window=50):
"""
Basado en CVD y large orders
"""
if 'cumulative_volume_delta' not in df.columns:
return {'flow_type': 0}
current_cvd = df['cumulative_volume_delta'].iloc[i]
future_cvd = df['cumulative_volume_delta'].iloc[i+forward_window]
cvd_change = future_cvd - current_cvd
# Large orders in window
large_orders = df['large_orders_count'].iloc[i:i+forward_window].sum()
if cvd_change > 0 and large_orders > 5:
flow_type = 1 # accumulation
elif cvd_change < 0 and large_orders > 5:
flow_type = 2 # distribution
else:
flow_type = 0 # neutral
institutional_activity = min(1.0, large_orders / 10)
return {
'flow_type': flow_type,
'institutional_activity': institutional_activity
}
```
---
## Feature Engineering Pipeline
### Pipeline Completo
```python
class FeatureEngineeringPipeline:
"""
Pipeline completo de feature engineering
"""
def __init__(self, config=None):
self.config = config or {}
self.scalers = {}
def transform(self, df):
"""
Transforma OHLCV raw a features completos
"""
features = pd.DataFrame(index=df.index)
# 1. Base features
print("Extracting base features...")
base = self._extract_base_features(df)
features = pd.concat([features, base], axis=1)
# 2. AMD features
print("Extracting AMD features...")
amd = self._extract_amd_features(df)
features = pd.concat([features, amd], axis=1)
# 3. ICT features
print("Extracting ICT features...")
ict = self._extract_ict_features(df)
features = pd.concat([features, ict], axis=1)
# 4. SMC features
print("Extracting SMC features...")
smc = self._extract_smc_features(df)
features = pd.concat([features, smc], axis=1)
# 5. Liquidity features
print("Extracting liquidity features...")
liquidity = self._extract_liquidity_features(df)
features = pd.concat([features, liquidity], axis=1)
# 6. Microstructure (si disponible)
if 'buy_volume' in df.columns:
print("Extracting microstructure features...")
micro = self._extract_microstructure_features(df)
features = pd.concat([features, micro], axis=1)
# 7. Scaling
print("Scaling features...")
features_scaled = self._scale_features(features)
# 8. Handle missing values
features_scaled = features_scaled.fillna(method='ffill').fillna(0)
return features_scaled
def _extract_base_features(self, df):
"""Extrae features base (21)"""
features = {}
# Volatilidad
features.update(calculate_volatility_features(df))
# Momentum
features.update(calculate_momentum_features(df))
# Moving averages
features.update(calculate_ma_features(df))
return pd.DataFrame(features)
def _scale_features(self, features):
"""Escala features usando RobustScaler"""
from sklearn.preprocessing import RobustScaler
if not self.scalers:
# Fit scalers
for col in features.columns:
self.scalers[col] = RobustScaler()
features[col] = self.scalers[col].fit_transform(
features[col].values.reshape(-1, 1)
)
else:
# Transform with fitted scalers
for col in features.columns:
if col in self.scalers:
features[col] = self.scalers[col].transform(
features[col].values.reshape(-1, 1)
)
return features
```
### Uso del Pipeline
```python
# Inicializar
pipeline = FeatureEngineeringPipeline()
# Transformar datos
df_raw = load_ohlcv_data('BTCUSDT', '5m')
features = pipeline.transform(df_raw)
print(f"Features shape: {features.shape}")
print(f"Features: {features.columns.tolist()}")
# Features ready for ML models
X = features.values
```
---
## Consideraciones T\u00e9cnicas
### 1. Prevenci\u00f3n de Look-Ahead Bias
**IMPORTANTE:** Nunca usar datos futuros para calcular features
```python
# ✅ CORRECTO
sma_20 = df['close'].rolling(20).mean()
# ❌ INCORRECTO
sma_20 = df['close'].rolling(20, center=True).mean() # Usa datos futuros!
```
### 2. Handling Missing Values
```python
def handle_missing(features):
"""
Estrategia de imputaci\u00f3n
"""
# 1. Forward fill (usar \u00faltimo valor conocido)
features = features.fillna(method='ffill')
# 2. Si a\u00fan hay NaNs al inicio, usar 0
features = features.fillna(0)
# 3. Alternativa: usar median
# features = features.fillna(features.median())
return features
```
### 3. Feature Scaling
```python
from sklearn.preprocessing import RobustScaler, StandardScaler, MinMaxScaler
# Price-based features → RobustScaler (maneja outliers)
price_scaler = RobustScaler()
# Indicators → StandardScaler
indicator_scaler = StandardScaler()
# Ratios/percentages → MinMaxScaler
ratio_scaler = MinMaxScaler(feature_range=(0, 1))
```
### 4. Feature Selection
```python
def select_important_features(X, y, model, top_n=50):
"""
Selecciona features m\u00e1s importantes
"""
# Train model
model.fit(X, y)
# Get importance
importance = pd.DataFrame({
'feature': feature_names,
'importance': model.feature_importances_
}).sort_values('importance', ascending=False)
# Select top N
selected_features = importance.head(top_n)['feature'].tolist()
return selected_features
```
### 5. Validaci\u00f3n Temporal
```python
def temporal_validation_split(df, train_pct=0.7, val_pct=0.15):
"""
Split temporal estricto (sin shuffle)
"""
n = len(df)
train_end = int(n * train_pct)
val_end = int(n * (train_pct + val_pct))
df_train = df.iloc[:train_end]
df_val = df.iloc[train_end:val_end]
df_test = df.iloc[val_end:]
# Verificar no hay overlap
assert df_train.index[-1] < df_val.index[0]
assert df_val.index[-1] < df_test.index[0]
return df_train, df_val, df_test
```
---
## Resumen de Dimensiones
| Categor\u00eda | Features | Modelos |
|-----------|----------|---------|
| **Base T\u00e9cnicos** | 21 | Todos |
| **AMD** | 25 | AMD, Range, TPSL |
| **ICT** | 15 | Range, TPSL |
| **SMC** | 12 | Range, TPSL |
| **Liquidez** | 10 | Liquidity, TPSL |
| **Microestructura** | 8 | OrderFlow |
| **TOTAL** | **91 features** | - |
---
**Documento Generado:** 2025-12-05
**Pr\u00f3xima Revisi\u00f3n:** 2025-Q1
**Contacto:** ml-engineering@trading.ai