ML Engine Updates: - Updated BTCUSD with Polygon API data (2024-2025): 215,699 new records - Re-trained all ML models: Attention (R²: 0.223), Base, Metamodel (87.3% confidence) - Backtest results: +176.71R profit with aggressive_filter strategy Documentation Consolidation: - Created docs/99-analisis/_MAP.md index with 13 new analysis documents - Consolidated inventories: removed duplicates from orchestration/inventarios/ - Updated ML_INVENTORY.yml with BTCUSD metrics and training results - Added execution reports: FASE11-BTCUSD, correction issues, alignment validation Architecture & Integration: - Updated all module documentation with NEXUS v3.4 frontmatter - Fixed _MAP.md indexes across all folders - Updated orchestration plans and traces Files: 229 changed, 5064 insertions(+), 1872 deletions(-) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
879 lines
29 KiB
Markdown
879 lines
29 KiB
Markdown
---
|
|
id: "FEATURES-TARGETS-ML"
|
|
title: "Cat\u00e1logo de Features y Targets - Machine Learning"
|
|
type: "Documentation"
|
|
project: "trading-platform"
|
|
version: "1.0.0"
|
|
updated_date: "2026-01-04"
|
|
---
|
|
|
|
# Cat\u00e1logo de Features y Targets - Machine Learning
|
|
|
|
**Versi\u00f3n:** 1.0.0
|
|
**Fecha:** 2025-12-05
|
|
**M\u00f3dulo:** OQI-006-ml-signals
|
|
**Autor:** Trading Strategist - Trading Platform
|
|
|
|
---
|
|
|
|
## Tabla de Contenidos
|
|
|
|
1. [Introducci\u00f3n](#introducci\u00f3n)
|
|
2. [Features Base (21)](#features-base-21)
|
|
3. [Features AMD (25)](#features-amd-25)
|
|
4. [Features ICT (15)](#features-ict-15)
|
|
5. [Features SMC (12)](#features-smc-12)
|
|
6. [Features de Liquidez (10)](#features-de-liquidez-10)
|
|
7. [Features de Microestructura (8)](#features-de-microestructura-8)
|
|
8. [Targets para Modelos](#targets-para-modelos)
|
|
9. [Feature Engineering Pipeline](#feature-engineering-pipeline)
|
|
10. [Consideraciones T\u00e9cnicas](#consideraciones-t\u00e9cnicas)
|
|
|
|
---
|
|
|
|
## Introducci\u00f3n
|
|
|
|
Este documento define el cat\u00e1logo completo de features (variables de entrada) y targets (variables objetivo) utilizados en los modelos ML de Trading Platform.
|
|
|
|
### Dimensiones Totales
|
|
|
|
| Categor\u00eda | Features | Modelos que las usan |
|
|
|-----------|----------|---------------------|
|
|
| **Base T\u00e9cnicos** | 21 | Todos |
|
|
| **AMD** | 25 | AMDDetector, Range, TPSL |
|
|
| **ICT** | 15 | Range, TPSL, Orchestrator |
|
|
| **SMC** | 12 | Range, TPSL, Orchestrator |
|
|
| **Liquidez** | 10 | LiquidityHunter, TPSL |
|
|
| **Microestructura** | 8 | OrderFlow (opcional) |
|
|
| **Total Base** | ~91 features | - |
|
|
|
|
---
|
|
|
|
## Features Base (21)
|
|
|
|
### Categor\u00eda: Volatilidad (8)
|
|
|
|
| Feature | F\u00f3rmula | Rango | Descripci\u00f3n |
|
|
|---------|---------|-------|---------------|
|
|
| `volatility_5` | `close.pct_change().rolling(5).std()` | [0, ∞) | Volatilidad 5 periodos |
|
|
| `volatility_10` | `close.pct_change().rolling(10).std()` | [0, ∞) | Volatilidad 10 periodos |
|
|
| `volatility_20` | `close.pct_change().rolling(20).std()` | [0, ∞) | Volatilidad 20 periodos |
|
|
| `volatility_50` | `close.pct_change().rolling(50).std()` | [0, ∞) | Volatilidad 50 periodos |
|
|
| `atr_5` | `TrueRange.rolling(5).mean()` | [0, ∞) | Average True Range 5p |
|
|
| `atr_10` | `TrueRange.rolling(10).mean()` | [0, ∞) | Average True Range 10p |
|
|
| `atr_14` | `TrueRange.rolling(14).mean()` | [0, ∞) | Average True Range 14p (est\u00e1ndar) |
|
|
| `atr_ratio` | `atr_14 / atr_14.rolling(50).mean()` | [0, ∞) | Ratio ATR actual vs promedio |
|
|
|
|
```python
|
|
def calculate_volatility_features(df):
|
|
features = {}
|
|
for period in [5, 10, 20, 50]:
|
|
features[f'volatility_{period}'] = df['close'].pct_change().rolling(period).std()
|
|
|
|
# ATR
|
|
high_low = df['high'] - df['low']
|
|
high_close = np.abs(df['high'] - df['close'].shift())
|
|
low_close = np.abs(df['low'] - df['close'].shift())
|
|
true_range = pd.concat([high_low, high_close, low_close], axis=1).max(axis=1)
|
|
|
|
for period in [5, 10, 14]:
|
|
features[f'atr_{period}'] = true_range.rolling(period).mean()
|
|
|
|
features['atr_ratio'] = features['atr_14'] / features['atr_14'].rolling(50).mean()
|
|
|
|
return features
|
|
```
|
|
|
|
### Categor\u00eda: Momentum (6)
|
|
|
|
| Feature | F\u00f3rmula | Rango | Descripci\u00f3n |
|
|
|---------|---------|-------|---------------|
|
|
| `momentum_5` | `close - close.shift(5)` | (-∞, ∞) | Momentum 5 periodos |
|
|
| `momentum_10` | `close - close.shift(10)` | (-∞, ∞) | Momentum 10 periodos |
|
|
| `momentum_20` | `close - close.shift(20)` | (-∞, ∞) | Momentum 20 periodos |
|
|
| `roc_5` | `(close / close.shift(5) - 1) * 100` | (-100, ∞) | Rate of Change 5p |
|
|
| `roc_10` | `(close / close.shift(10) - 1) * 100` | (-100, ∞) | Rate of Change 10p |
|
|
| `rsi_14` | Ver f\u00f3rmula RSI | [0, 100] | Relative Strength Index |
|
|
|
|
```python
|
|
def calculate_momentum_features(df):
|
|
features = {}
|
|
|
|
# Momentum
|
|
for period in [5, 10, 20]:
|
|
features[f'momentum_{period}'] = df['close'] - df['close'].shift(period)
|
|
features[f'roc_{period}'] = (df['close'] / df['close'].shift(period) - 1) * 100
|
|
|
|
# RSI
|
|
delta = df['close'].diff()
|
|
gain = (delta.where(delta > 0, 0)).rolling(14).mean()
|
|
loss = (-delta.where(delta < 0, 0)).rolling(14).mean()
|
|
rs = gain / loss
|
|
features['rsi_14'] = 100 - (100 / (1 + rs))
|
|
|
|
return features
|
|
```
|
|
|
|
### Categor\u00eda: Medias M\u00f3viles (7)
|
|
|
|
| Feature | F\u00f3rmula | Rango | Descripci\u00f3n |
|
|
|---------|---------|-------|---------------|
|
|
| `sma_10` | `close.rolling(10).mean()` | [0, ∞) | Simple Moving Average 10 |
|
|
| `sma_20` | `close.rolling(20).mean()` | [0, ∞) | Simple Moving Average 20 |
|
|
| `sma_50` | `close.rolling(50).mean()` | [0, ∞) | Simple Moving Average 50 |
|
|
| `sma_ratio_10` | `close / sma_10` | [0, ∞) | Ratio precio/SMA10 |
|
|
| `sma_ratio_20` | `close / sma_20` | [0, ∞) | Ratio precio/SMA20 |
|
|
| `sma_ratio_50` | `close / sma_50` | [0, ∞) | Ratio precio/SMA50 |
|
|
| `sma_slope_20` | `sma_20.diff(5) / 5` | (-∞, ∞) | Pendiente de SMA20 |
|
|
|
|
```python
|
|
def calculate_ma_features(df):
|
|
features = {}
|
|
|
|
for period in [10, 20, 50]:
|
|
features[f'sma_{period}'] = df['close'].rolling(period).mean()
|
|
features[f'sma_ratio_{period}'] = df['close'] / features[f'sma_{period}']
|
|
|
|
features['sma_slope_20'] = features['sma_20'].diff(5) / 5
|
|
|
|
return features
|
|
```
|
|
|
|
---
|
|
|
|
## Features AMD (25)
|
|
|
|
### Categor\u00eda: Price Action (10)
|
|
|
|
| Feature | C\u00e1lculo | Rango | Uso |
|
|
|---------|---------|-------|-----|
|
|
| `range_ratio` | `(high - low) / high.rolling(20).mean()` | [0, ∞) | Compresi\u00f3n de rango |
|
|
| `range_ma` | `(high - low).rolling(20).mean()` | [0, ∞) | Promedio de rango |
|
|
| `hl_range_pct` | `(high - low) / close` | [0, 1] | Rango como % de precio |
|
|
| `body_size` | `abs(close - open) / (high - low)` | [0, 1] | Tama\u00f1o del cuerpo |
|
|
| `upper_wick` | `(high - max(close, open)) / (high - low)` | [0, 1] | Mecha superior |
|
|
| `lower_wick` | `(min(close, open) - low) / (high - low)` | [0, 1] | Mecha inferior |
|
|
| `buying_pressure` | `(close - low) / (high - low)` | [0, 1] | Presi\u00f3n compradora |
|
|
| `selling_pressure` | `(high - close) / (high - low)` | [0, 1] | Presi\u00f3n vendedora |
|
|
| `close_position` | `(close - low) / (high - low)` | [0, 1] | Posici\u00f3n del cierre |
|
|
| `range_expansion` | `(high - low) / (high - low).shift(1)` | [0, ∞) | Expansi\u00f3n de rango |
|
|
|
|
### Categor\u00eda: Volumen (8)
|
|
|
|
| Feature | C\u00e1lculo | Descripci\u00f3n |
|
|
|---------|---------|---------------|
|
|
| `volume_ratio` | `volume / volume.rolling(20).mean()` | Volumen vs promedio |
|
|
| `volume_trend` | `volume.rolling(10).mean() - volume.rolling(30).mean()` | Tendencia de volumen |
|
|
| `volume_ma` | `volume.rolling(20).mean()` | Volumen promedio |
|
|
| `volume_spike_count` | `(volume > volume_ma * 2).rolling(30).sum()` | Spikes recientes |
|
|
| `obv` | Ver c\u00e1lculo OBV | On-Balance Volume |
|
|
| `obv_slope` | `obv.diff(5) / 5` | Tendencia de OBV |
|
|
| `vwap_distance` | `(close - vwap) / close` | Distancia a VWAP |
|
|
| `volume_on_up` | Ver c\u00e1lculo | Volumen en subidas |
|
|
|
|
```python
|
|
def calculate_volume_features(df):
|
|
features = {}
|
|
|
|
features['volume_ratio'] = df['volume'] / df['volume'].rolling(20).mean()
|
|
features['volume_trend'] = df['volume'].rolling(10).mean() - df['volume'].rolling(30).mean()
|
|
features['volume_ma'] = df['volume'].rolling(20).mean()
|
|
features['volume_spike_count'] = (df['volume'] > features['volume_ma'] * 2).rolling(30).sum()
|
|
|
|
# OBV
|
|
obv = (df['volume'] * ((df['close'] > df['close'].shift(1)).astype(int) * 2 - 1)).cumsum()
|
|
features['obv'] = obv
|
|
features['obv_slope'] = obv.diff(5) / 5
|
|
|
|
# VWAP
|
|
vwap = (df['close'] * df['volume']).cumsum() / df['volume'].cumsum()
|
|
features['vwap_distance'] = (df['close'] - vwap) / df['close']
|
|
|
|
return features
|
|
```
|
|
|
|
### Categor\u00eda: Market Structure (7)
|
|
|
|
| Feature | C\u00e1lculo | Uso |
|
|
|---------|---------|-----|
|
|
| `higher_highs_count` | `(high > high.shift(1)).rolling(10).sum()` | Cuenta HH |
|
|
| `higher_lows_count` | `(low > low.shift(1)).rolling(10).sum()` | Cuenta HL |
|
|
| `lower_highs_count` | `(high < high.shift(1)).rolling(10).sum()` | Cuenta LH |
|
|
| `lower_lows_count` | `(low < low.shift(1)).rolling(10).sum()` | Cuenta LL |
|
|
| `swing_high_distance` | `(swing_high_20 - close) / close` | Distancia a swing high |
|
|
| `swing_low_distance` | `(close - swing_low_20) / close` | Distancia a swing low |
|
|
| `market_structure_score` | Ver c\u00e1lculo | Score de estructura |
|
|
|
|
```python
|
|
def calculate_market_structure_features(df):
|
|
features = {}
|
|
|
|
features['higher_highs_count'] = (df['high'] > df['high'].shift(1)).rolling(10).sum()
|
|
features['higher_lows_count'] = (df['low'] > df['low'].shift(1)).rolling(10).sum()
|
|
features['lower_highs_count'] = (df['high'] < df['high'].shift(1)).rolling(10).sum()
|
|
features['lower_lows_count'] = (df['low'] < df['low'].shift(1)).rolling(10).sum()
|
|
|
|
swing_high = df['high'].rolling(20).max()
|
|
swing_low = df['low'].rolling(20).min()
|
|
|
|
features['swing_high_distance'] = (swing_high - df['close']) / df['close']
|
|
features['swing_low_distance'] = (df['close'] - swing_low) / df['close']
|
|
|
|
# Market structure score (-1 bearish, +1 bullish)
|
|
bullish_score = (features['higher_highs_count'] + features['higher_lows_count']) / 20
|
|
bearish_score = (features['lower_highs_count'] + features['lower_lows_count']) / 20
|
|
features['market_structure_score'] = bullish_score - bearish_score
|
|
|
|
return features
|
|
```
|
|
|
|
---
|
|
|
|
## Features ICT (15)
|
|
|
|
### Categor\u00eda: OTE & Fibonacci (5)
|
|
|
|
| Feature | C\u00e1lculo | Rango | Descripci\u00f3n |
|
|
|---------|---------|-------|---------------|
|
|
| `ote_position` | `(close - swing_low) / (swing_high - swing_low)` | [0, 1] | Posici\u00f3n en rango |
|
|
| `in_discount_zone` | `1 if ote_position < 0.38 else 0` | {0, 1} | En zona discount |
|
|
| `in_premium_zone` | `1 if ote_position > 0.62 else 0` | {0, 1} | En zona premium |
|
|
| `in_ote_buy_zone` | `1 if 0.62 <= ote_position <= 0.79 else 0` | {0, 1} | En OTE compra |
|
|
| `fib_distance_50` | `abs(ote_position - 0.5)` | [0, 0.5] | Distancia a equilibrio |
|
|
|
|
### Categor\u00eda: Killzones & Timing (5)
|
|
|
|
| Feature | C\u00e1lculo | Descripci\u00f3n |
|
|
|---------|---------|---------------|
|
|
| `is_london_kz` | Basado en hora EST | Killzone London |
|
|
| `is_ny_kz` | Basado en hora EST | Killzone NY |
|
|
| `is_asian_kz` | Basado en hora EST | Killzone Asian |
|
|
| `session_strength` | 0-1 seg\u00fan killzone | Fuerza de sesi\u00f3n |
|
|
| `session_overlap` | Detecci\u00f3n de overlap | Overlap London/NY |
|
|
|
|
```python
|
|
def calculate_ict_features(df):
|
|
features = {}
|
|
|
|
# OTE position
|
|
swing_high = df['high'].rolling(50).max()
|
|
swing_low = df['low'].rolling(50).min()
|
|
range_size = swing_high - swing_low
|
|
|
|
features['ote_position'] = (df['close'] - swing_low) / (range_size + 1e-8)
|
|
features['in_discount_zone'] = (features['ote_position'] < 0.38).astype(int)
|
|
features['in_premium_zone'] = (features['ote_position'] > 0.62).astype(int)
|
|
features['in_ote_buy_zone'] = (
|
|
(features['ote_position'] >= 0.62) & (features['ote_position'] <= 0.79)
|
|
).astype(int)
|
|
features['fib_distance_50'] = np.abs(features['ote_position'] - 0.5)
|
|
|
|
# Killzones
|
|
hour_est = df.index.tz_convert('America/New_York').hour
|
|
features['is_london_kz'] = ((hour_est >= 2) & (hour_est < 5)).astype(int)
|
|
features['is_ny_kz'] = ((hour_est >= 8) & (hour_est < 11)).astype(int)
|
|
features['is_asian_kz'] = ((hour_est >= 20) | (hour_est < 0)).astype(int)
|
|
|
|
# Session strength
|
|
features['session_strength'] = 0.1 # default
|
|
features.loc[features['is_london_kz'] == 1, 'session_strength'] = 0.9
|
|
features.loc[features['is_ny_kz'] == 1, 'session_strength'] = 1.0
|
|
features.loc[features['is_asian_kz'] == 1, 'session_strength'] = 0.3
|
|
|
|
# Session overlap
|
|
features['session_overlap'] = (
|
|
(hour_est >= 10) & (hour_est < 12)
|
|
).astype(int) # London close + NY open
|
|
|
|
return features
|
|
```
|
|
|
|
### Categor\u00eda: Ranges (5)
|
|
|
|
| Feature | C\u00e1lculo | Descripci\u00f3n |
|
|
|---------|---------|---------------|
|
|
| `weekly_range_position` | Posici\u00f3n en rango semanal | 0-1 |
|
|
| `daily_range_position` | Posici\u00f3n en rango diario | 0-1 |
|
|
| `weekly_range_size` | High - Low semanal | Absoluto |
|
|
| `daily_range_size` | High - Low diario | Absoluto |
|
|
| `range_expansion_daily` | Ratio range actual/promedio | >1 = expansi\u00f3n |
|
|
|
|
---
|
|
|
|
## Features SMC (12)
|
|
|
|
### Categor\u00eda: Structure Breaks (6)
|
|
|
|
| Feature | C\u00e1lculo | Uso |
|
|
|---------|---------|-----|
|
|
| `choch_bullish_count` | Count en ventana 30 | CHOCHs alcistas |
|
|
| `choch_bearish_count` | Count en ventana 30 | CHOCHs bajistas |
|
|
| `bos_bullish_count` | Count en ventana 30 | BOS alcistas |
|
|
| `bos_bearish_count` | Count en ventana 30 | BOS bajistas |
|
|
| `choch_recency` | Bars desde \u00faltimo CHOCH | 0 = muy reciente |
|
|
| `bos_recency` | Bars desde \u00faltimo BOS | 0 = muy reciente |
|
|
|
|
```python
|
|
def calculate_smc_features(df):
|
|
features = {}
|
|
|
|
# Detectar CHOCHs y BOS
|
|
choch_signals = detect_choch(df, window=20)
|
|
bos_signals = detect_bos(df, window=20)
|
|
|
|
# Contar por tipo
|
|
features['choch_bullish_count'] = count_signals_in_window(
|
|
choch_signals, 'bullish_choch', window=30
|
|
)
|
|
features['choch_bearish_count'] = count_signals_in_window(
|
|
choch_signals, 'bearish_choch', window=30
|
|
)
|
|
features['bos_bullish_count'] = count_signals_in_window(
|
|
bos_signals, 'bullish_bos', window=30
|
|
)
|
|
features['bos_bearish_count'] = count_signals_in_window(
|
|
bos_signals, 'bearish_bos', window=30
|
|
)
|
|
|
|
# Recency
|
|
features['choch_recency'] = bars_since_last_signal(choch_signals)
|
|
features['bos_recency'] = bars_since_last_signal(bos_signals)
|
|
|
|
return features
|
|
```
|
|
|
|
### Categor\u00eda: Displacement & Flow (6)
|
|
|
|
| Feature | C\u00e1lculo | Descripci\u00f3n |
|
|
|---------|---------|---------------|
|
|
| `displacement_strength` | Movimiento / ATR | Fuerza de displacement |
|
|
| `displacement_direction` | 1=alcista, -1=bajista, 0=neutral | Direcci\u00f3n |
|
|
| `displacement_recency` | Bars desde \u00faltimo | Recencia |
|
|
| `inducement_count` | Count en ventana 20 | Inducements detectados |
|
|
| `inducement_bullish` | Count bullish inducements | Trampas alcistas |
|
|
| `inducement_bearish` | Count bearish inducements | Trampas bajistas |
|
|
|
|
---
|
|
|
|
## Features de Liquidez (10)
|
|
|
|
| Feature | C\u00e1lculo | Rango | Descripci\u00f3n |
|
|
|---------|---------|-------|---------------|
|
|
| `bsl_distance` | `(bsl_level - close) / close` | [0, ∞) | Distancia a BSL |
|
|
| `ssl_distance` | `(close - ssl_level) / close` | [0, ∞) | Distancia a SSL |
|
|
| `bsl_density` | Count de BSL levels cercanos | [0, ∞) | Densidad de BSL |
|
|
| `ssl_density` | Count de SSL levels cercanos | [0, ∞) | Densidad de SSL |
|
|
| `bsl_strength` | Volumen en BSL level | [0, ∞) | Fuerza del BSL |
|
|
| `ssl_strength` | Volumen en SSL level | [0, ∞) | Fuerza del SSL |
|
|
| `liquidity_grab_count` | Count sweeps recientes | [0, ∞) | Sweeps recientes |
|
|
| `bsl_sweep_recent` | 1 si sweep reciente | {0, 1} | BSL swept |
|
|
| `ssl_sweep_recent` | 1 si sweep reciente | {0, 1} | SSL swept |
|
|
| `near_liquidity` | 1 si <1% de level | {0, 1} | Cerca de liquidez |
|
|
|
|
```python
|
|
def calculate_liquidity_features(df, lookback=20):
|
|
features = {}
|
|
|
|
# Identificar liquidity pools
|
|
swing_highs = df['high'].rolling(lookback, center=True).max()
|
|
swing_lows = df['low'].rolling(lookback, center=True).min()
|
|
|
|
# BSL (Buy Side Liquidity)
|
|
bsl_levels = find_liquidity_levels(df, 'high', lookback)
|
|
features['bsl_distance'] = (bsl_levels['nearest'] - df['close']) / df['close']
|
|
features['bsl_density'] = bsl_levels['density']
|
|
features['bsl_strength'] = bsl_levels['strength']
|
|
|
|
# SSL (Sell Side Liquidity)
|
|
ssl_levels = find_liquidity_levels(df, 'low', lookback)
|
|
features['ssl_distance'] = (df['close'] - ssl_levels['nearest']) / df['close']
|
|
features['ssl_density'] = ssl_levels['density']
|
|
features['ssl_strength'] = ssl_levels['strength']
|
|
|
|
# Sweeps
|
|
sweeps = detect_liquidity_sweeps(df, window=30)
|
|
features['liquidity_grab_count'] = len(sweeps)
|
|
features['bsl_sweep_recent'] = any(s['type'] == 'bsl' for s in sweeps[-5:])
|
|
features['ssl_sweep_recent'] = any(s['type'] == 'ssl' for s in sweeps[-5:])
|
|
|
|
# Proximity
|
|
features['near_liquidity'] = (
|
|
(features['bsl_distance'] < 0.01) | (features['ssl_distance'] < 0.01)
|
|
).astype(int)
|
|
|
|
return features
|
|
```
|
|
|
|
---
|
|
|
|
## Features de Microestructura (8)
|
|
|
|
**Nota:** Requiere datos de volumen granular o tick data
|
|
|
|
| Feature | C\u00e1lculo | Descripci\u00f3n |
|
|
|---------|---------|---------------|
|
|
| `volume_delta` | `buy_volume - sell_volume` | Delta de volumen |
|
|
| `cumulative_volume_delta` | CVD acumulado | CVD |
|
|
| `cvd_slope` | `cvd.diff(5) / 5` | Tendencia CVD |
|
|
| `tick_imbalance` | `(upticks - downticks) / total_ticks` | Imbalance de ticks |
|
|
| `large_orders_count` | Count de \u00f3rdenes grandes | Actividad institucional |
|
|
| `order_flow_imbalance` | Ratio buy/sell | -1 a +1 |
|
|
| `poc_distance` | Distancia a Point of Control | Distancia a POC |
|
|
| `hvn_proximity` | Distancia a High Volume Node | Zona de alto volumen |
|
|
|
|
```python
|
|
def calculate_microstructure_features(df):
|
|
"""
|
|
Requiere datos extendidos: buy_volume, sell_volume, tick_data
|
|
"""
|
|
features = {}
|
|
|
|
if 'buy_volume' in df.columns and 'sell_volume' in df.columns:
|
|
features['volume_delta'] = df['buy_volume'] - df['sell_volume']
|
|
features['cumulative_volume_delta'] = features['volume_delta'].cumsum()
|
|
features['cvd_slope'] = features['cumulative_volume_delta'].diff(5) / 5
|
|
|
|
total_volume = df['buy_volume'] + df['sell_volume']
|
|
features['order_flow_imbalance'] = features['volume_delta'] / (total_volume + 1e-8)
|
|
|
|
# Large orders
|
|
threshold = df['volume'].rolling(20).mean() * 2
|
|
features['large_orders_count'] = (df['volume'] > threshold).rolling(30).sum()
|
|
|
|
# Volume profile
|
|
volume_profile = calculate_volume_profile(df, bins=50)
|
|
features['poc_distance'] = (df['close'] - volume_profile['poc']) / df['close']
|
|
|
|
return features
|
|
```
|
|
|
|
---
|
|
|
|
## Targets para Modelos
|
|
|
|
### Target 1: AMD Phase (AMDDetector)
|
|
|
|
```python
|
|
TARGET_AMD_PHASE = {
|
|
0: 'neutral',
|
|
1: 'accumulation',
|
|
2: 'manipulation',
|
|
3: 'distribution'
|
|
}
|
|
|
|
def label_amd_phase(df, i, forward_window=20):
|
|
"""
|
|
Ver documentaci\u00f3n ESTRATEGIA-AMD-COMPLETA.md
|
|
"""
|
|
# Implementaci\u00f3n completa en documento AMD
|
|
pass
|
|
```
|
|
|
|
### Target 2: Delta High/Low (RangePredictor)
|
|
|
|
```python
|
|
# Targets de regresi\u00f3n
|
|
TARGETS_RANGE = {
|
|
'delta_high_15m': float, # Predicci\u00f3n continua
|
|
'delta_low_15m': float,
|
|
'delta_high_1h': float,
|
|
'delta_low_1h': float,
|
|
|
|
# Targets de clasificaci\u00f3n (bins)
|
|
'bin_high_15m': int, # 0-3
|
|
'bin_low_15m': int,
|
|
'bin_high_1h': int,
|
|
'bin_low_1h': int
|
|
}
|
|
|
|
def calculate_range_targets(df, horizons={'15m': 3, '1h': 12}):
|
|
targets = {}
|
|
atr = calculate_atr(df, 14)
|
|
|
|
for name, periods in horizons.items():
|
|
# Delta high
|
|
targets[f'delta_high_{name}'] = (
|
|
df['high'].rolling(periods).max().shift(-periods) - df['close']
|
|
) / df['close']
|
|
|
|
# Delta low
|
|
targets[f'delta_low_{name}'] = (
|
|
df['close'] - df['low'].rolling(periods).min().shift(-periods)
|
|
) / df['close']
|
|
|
|
# Bins (volatilidad normalizada por ATR)
|
|
def to_bin(delta_series):
|
|
ratio = delta_series / atr
|
|
bins = pd.cut(
|
|
ratio,
|
|
bins=[-np.inf, 0.3, 0.7, 1.2, np.inf],
|
|
labels=[0, 1, 2, 3]
|
|
)
|
|
return bins.astype(float)
|
|
|
|
targets[f'bin_high_{name}'] = to_bin(targets[f'delta_high_{name}'])
|
|
targets[f'bin_low_{name}'] = to_bin(targets[f'delta_low_{name}'])
|
|
|
|
return pd.DataFrame(targets)
|
|
```
|
|
|
|
### Target 3: TP vs SL (TPSLClassifier)
|
|
|
|
```python
|
|
TARGETS_TPSL = {
|
|
'tp_first_15m_rr_2_1': int, # 0 o 1
|
|
'tp_first_15m_rr_3_1': int,
|
|
'tp_first_1h_rr_2_1': int,
|
|
'tp_first_1h_rr_3_1': int
|
|
}
|
|
|
|
def calculate_tpsl_targets(df, rr_configs):
|
|
"""
|
|
Simula si TP se alcanza antes que SL
|
|
"""
|
|
targets = {}
|
|
atr = calculate_atr(df, 14)
|
|
|
|
for rr in rr_configs:
|
|
sl_dist = atr * rr['sl_atr_multiple']
|
|
tp_dist = atr * rr['tp_atr_multiple']
|
|
|
|
def check_tp_first(i, horizon_bars):
|
|
if i + horizon_bars >= len(df):
|
|
return np.nan
|
|
|
|
entry_price = df['close'].iloc[i]
|
|
sl_price = entry_price - sl_dist.iloc[i]
|
|
tp_price = entry_price + tp_dist.iloc[i]
|
|
|
|
future = df.iloc[i+1:i+horizon_bars+1]
|
|
|
|
for _, row in future.iterrows():
|
|
if row['low'] <= sl_price:
|
|
return 0 # SL hit first
|
|
elif row['high'] >= tp_price:
|
|
return 1 # TP hit first
|
|
|
|
return np.nan # Neither hit
|
|
|
|
for horizon_name, horizon_bars in [('15m', 3), ('1h', 12)]:
|
|
target_name = f'tp_first_{horizon_name}_{rr["name"]}'
|
|
targets[target_name] = [
|
|
check_tp_first(i, horizon_bars) for i in range(len(df))
|
|
]
|
|
|
|
return pd.DataFrame(targets)
|
|
```
|
|
|
|
### Target 4: Liquidity Sweep (LiquidityHunter)
|
|
|
|
```python
|
|
TARGETS_LIQUIDITY = {
|
|
'bsl_sweep': int, # 0 o 1
|
|
'ssl_sweep': int,
|
|
'any_sweep': int,
|
|
'sweep_timing': int # Bars hasta sweep
|
|
}
|
|
|
|
def label_liquidity_sweep(df, i, forward_window=10):
|
|
"""
|
|
Etiqueta si habr\u00e1 liquidity sweep
|
|
"""
|
|
if i + forward_window >= len(df):
|
|
return {'bsl_sweep': np.nan, 'ssl_sweep': np.nan}
|
|
|
|
swing_high = df['high'].iloc[max(0, i-20):i].max()
|
|
swing_low = df['low'].iloc[max(0, i-20):i].min()
|
|
|
|
future = df.iloc[i:i+forward_window]
|
|
|
|
# BSL sweep (sweep of highs)
|
|
bsl_swept = (future['high'] >= swing_high * 1.005).any()
|
|
|
|
# SSL sweep (sweep of lows)
|
|
ssl_swept = (future['low'] <= swing_low * 0.995).any()
|
|
|
|
# Timing
|
|
if bsl_swept:
|
|
sweep_timing = (future['high'] >= swing_high * 1.005).idxmax()
|
|
elif ssl_swept:
|
|
sweep_timing = (future['low'] <= swing_low * 0.995).idxmax()
|
|
else:
|
|
sweep_timing = np.nan
|
|
|
|
return {
|
|
'bsl_sweep': 1 if bsl_swept else 0,
|
|
'ssl_sweep': 1 if ssl_swept else 0,
|
|
'any_sweep': 1 if (bsl_swept or ssl_swept) else 0,
|
|
'sweep_timing': sweep_timing
|
|
}
|
|
```
|
|
|
|
### Target 5: Order Flow (OrderFlowAnalyzer)
|
|
|
|
```python
|
|
TARGETS_ORDER_FLOW = {
|
|
'flow_type': int, # 0=neutral, 1=accumulation, 2=distribution
|
|
'institutional_activity': float # 0-1 score
|
|
}
|
|
|
|
def label_order_flow(df, i, forward_window=50):
|
|
"""
|
|
Basado en CVD y large orders
|
|
"""
|
|
if 'cumulative_volume_delta' not in df.columns:
|
|
return {'flow_type': 0}
|
|
|
|
current_cvd = df['cumulative_volume_delta'].iloc[i]
|
|
future_cvd = df['cumulative_volume_delta'].iloc[i+forward_window]
|
|
|
|
cvd_change = future_cvd - current_cvd
|
|
|
|
# Large orders in window
|
|
large_orders = df['large_orders_count'].iloc[i:i+forward_window].sum()
|
|
|
|
if cvd_change > 0 and large_orders > 5:
|
|
flow_type = 1 # accumulation
|
|
elif cvd_change < 0 and large_orders > 5:
|
|
flow_type = 2 # distribution
|
|
else:
|
|
flow_type = 0 # neutral
|
|
|
|
institutional_activity = min(1.0, large_orders / 10)
|
|
|
|
return {
|
|
'flow_type': flow_type,
|
|
'institutional_activity': institutional_activity
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Feature Engineering Pipeline
|
|
|
|
### Pipeline Completo
|
|
|
|
```python
|
|
class FeatureEngineeringPipeline:
|
|
"""
|
|
Pipeline completo de feature engineering
|
|
"""
|
|
|
|
def __init__(self, config=None):
|
|
self.config = config or {}
|
|
self.scalers = {}
|
|
|
|
def transform(self, df):
|
|
"""
|
|
Transforma OHLCV raw a features completos
|
|
"""
|
|
features = pd.DataFrame(index=df.index)
|
|
|
|
# 1. Base features
|
|
print("Extracting base features...")
|
|
base = self._extract_base_features(df)
|
|
features = pd.concat([features, base], axis=1)
|
|
|
|
# 2. AMD features
|
|
print("Extracting AMD features...")
|
|
amd = self._extract_amd_features(df)
|
|
features = pd.concat([features, amd], axis=1)
|
|
|
|
# 3. ICT features
|
|
print("Extracting ICT features...")
|
|
ict = self._extract_ict_features(df)
|
|
features = pd.concat([features, ict], axis=1)
|
|
|
|
# 4. SMC features
|
|
print("Extracting SMC features...")
|
|
smc = self._extract_smc_features(df)
|
|
features = pd.concat([features, smc], axis=1)
|
|
|
|
# 5. Liquidity features
|
|
print("Extracting liquidity features...")
|
|
liquidity = self._extract_liquidity_features(df)
|
|
features = pd.concat([features, liquidity], axis=1)
|
|
|
|
# 6. Microstructure (si disponible)
|
|
if 'buy_volume' in df.columns:
|
|
print("Extracting microstructure features...")
|
|
micro = self._extract_microstructure_features(df)
|
|
features = pd.concat([features, micro], axis=1)
|
|
|
|
# 7. Scaling
|
|
print("Scaling features...")
|
|
features_scaled = self._scale_features(features)
|
|
|
|
# 8. Handle missing values
|
|
features_scaled = features_scaled.fillna(method='ffill').fillna(0)
|
|
|
|
return features_scaled
|
|
|
|
def _extract_base_features(self, df):
|
|
"""Extrae features base (21)"""
|
|
features = {}
|
|
|
|
# Volatilidad
|
|
features.update(calculate_volatility_features(df))
|
|
|
|
# Momentum
|
|
features.update(calculate_momentum_features(df))
|
|
|
|
# Moving averages
|
|
features.update(calculate_ma_features(df))
|
|
|
|
return pd.DataFrame(features)
|
|
|
|
def _scale_features(self, features):
|
|
"""Escala features usando RobustScaler"""
|
|
from sklearn.preprocessing import RobustScaler
|
|
|
|
if not self.scalers:
|
|
# Fit scalers
|
|
for col in features.columns:
|
|
self.scalers[col] = RobustScaler()
|
|
features[col] = self.scalers[col].fit_transform(
|
|
features[col].values.reshape(-1, 1)
|
|
)
|
|
else:
|
|
# Transform with fitted scalers
|
|
for col in features.columns:
|
|
if col in self.scalers:
|
|
features[col] = self.scalers[col].transform(
|
|
features[col].values.reshape(-1, 1)
|
|
)
|
|
|
|
return features
|
|
```
|
|
|
|
### Uso del Pipeline
|
|
|
|
```python
|
|
# Inicializar
|
|
pipeline = FeatureEngineeringPipeline()
|
|
|
|
# Transformar datos
|
|
df_raw = load_ohlcv_data('BTCUSDT', '5m')
|
|
features = pipeline.transform(df_raw)
|
|
|
|
print(f"Features shape: {features.shape}")
|
|
print(f"Features: {features.columns.tolist()}")
|
|
|
|
# Features ready for ML models
|
|
X = features.values
|
|
```
|
|
|
|
---
|
|
|
|
## Consideraciones T\u00e9cnicas
|
|
|
|
### 1. Prevenci\u00f3n de Look-Ahead Bias
|
|
|
|
**IMPORTANTE:** Nunca usar datos futuros para calcular features
|
|
|
|
```python
|
|
# ✅ CORRECTO
|
|
sma_20 = df['close'].rolling(20).mean()
|
|
|
|
# ❌ INCORRECTO
|
|
sma_20 = df['close'].rolling(20, center=True).mean() # Usa datos futuros!
|
|
```
|
|
|
|
### 2. Handling Missing Values
|
|
|
|
```python
|
|
def handle_missing(features):
|
|
"""
|
|
Estrategia de imputaci\u00f3n
|
|
"""
|
|
# 1. Forward fill (usar \u00faltimo valor conocido)
|
|
features = features.fillna(method='ffill')
|
|
|
|
# 2. Si a\u00fan hay NaNs al inicio, usar 0
|
|
features = features.fillna(0)
|
|
|
|
# 3. Alternativa: usar median
|
|
# features = features.fillna(features.median())
|
|
|
|
return features
|
|
```
|
|
|
|
### 3. Feature Scaling
|
|
|
|
```python
|
|
from sklearn.preprocessing import RobustScaler, StandardScaler, MinMaxScaler
|
|
|
|
# Price-based features → RobustScaler (maneja outliers)
|
|
price_scaler = RobustScaler()
|
|
|
|
# Indicators → StandardScaler
|
|
indicator_scaler = StandardScaler()
|
|
|
|
# Ratios/percentages → MinMaxScaler
|
|
ratio_scaler = MinMaxScaler(feature_range=(0, 1))
|
|
```
|
|
|
|
### 4. Feature Selection
|
|
|
|
```python
|
|
def select_important_features(X, y, model, top_n=50):
|
|
"""
|
|
Selecciona features m\u00e1s importantes
|
|
"""
|
|
# Train model
|
|
model.fit(X, y)
|
|
|
|
# Get importance
|
|
importance = pd.DataFrame({
|
|
'feature': feature_names,
|
|
'importance': model.feature_importances_
|
|
}).sort_values('importance', ascending=False)
|
|
|
|
# Select top N
|
|
selected_features = importance.head(top_n)['feature'].tolist()
|
|
|
|
return selected_features
|
|
```
|
|
|
|
### 5. Validaci\u00f3n Temporal
|
|
|
|
```python
|
|
def temporal_validation_split(df, train_pct=0.7, val_pct=0.15):
|
|
"""
|
|
Split temporal estricto (sin shuffle)
|
|
"""
|
|
n = len(df)
|
|
train_end = int(n * train_pct)
|
|
val_end = int(n * (train_pct + val_pct))
|
|
|
|
df_train = df.iloc[:train_end]
|
|
df_val = df.iloc[train_end:val_end]
|
|
df_test = df.iloc[val_end:]
|
|
|
|
# Verificar no hay overlap
|
|
assert df_train.index[-1] < df_val.index[0]
|
|
assert df_val.index[-1] < df_test.index[0]
|
|
|
|
return df_train, df_val, df_test
|
|
```
|
|
|
|
---
|
|
|
|
## Resumen de Dimensiones
|
|
|
|
| Categor\u00eda | Features | Modelos |
|
|
|-----------|----------|---------|
|
|
| **Base T\u00e9cnicos** | 21 | Todos |
|
|
| **AMD** | 25 | AMD, Range, TPSL |
|
|
| **ICT** | 15 | Range, TPSL |
|
|
| **SMC** | 12 | Range, TPSL |
|
|
| **Liquidez** | 10 | Liquidity, TPSL |
|
|
| **Microestructura** | 8 | OrderFlow |
|
|
| **TOTAL** | **91 features** | - |
|
|
|
|
---
|
|
|
|
**Documento Generado:** 2025-12-05
|
|
**Pr\u00f3xima Revisi\u00f3n:** 2025-Q1
|
|
**Contacto:** ml-engineering@trading.ai
|