ML Engine Updates: - Updated BTCUSD with Polygon API data (2024-2025): 215,699 new records - Re-trained all ML models: Attention (R²: 0.223), Base, Metamodel (87.3% confidence) - Backtest results: +176.71R profit with aggressive_filter strategy Documentation Consolidation: - Created docs/99-analisis/_MAP.md index with 13 new analysis documents - Consolidated inventories: removed duplicates from orchestration/inventarios/ - Updated ML_INVENTORY.yml with BTCUSD metrics and training results - Added execution reports: FASE11-BTCUSD, correction issues, alignment validation Architecture & Integration: - Updated all module documentation with NEXUS v3.4 frontmatter - Fixed _MAP.md indexes across all folders - Updated orchestration plans and traces Files: 229 changed, 5064 insertions(+), 1872 deletions(-) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
994 lines
35 KiB
Markdown
994 lines
35 KiB
Markdown
---
|
|
id: "FEATURES-TARGETS-COMPLETO"
|
|
title: "Features y Targets Completos - Modelos ML Trading Platform"
|
|
type: "Documentation"
|
|
project: "trading-platform"
|
|
version: "1.0.0"
|
|
updated_date: "2026-01-04"
|
|
---
|
|
|
|
# Features y Targets Completos - Modelos ML Trading Platform
|
|
|
|
**Version:** 2.0.0
|
|
**Fecha:** 2025-12-08
|
|
**Modulo:** OQI-006-ml-signals
|
|
**Autor:** Trading Strategist - Trading Platform
|
|
|
|
---
|
|
|
|
## Tabla de Contenidos
|
|
|
|
1. [Vision General](#vision-general)
|
|
2. [Features por Categoria](#features-por-categoria)
|
|
3. [Targets por Modelo](#targets-por-modelo)
|
|
4. [Feature Engineering Pipeline](#feature-engineering-pipeline)
|
|
5. [Validacion y Testing](#validacion-y-testing)
|
|
|
|
---
|
|
|
|
## Vision General
|
|
|
|
### Resumen de Features
|
|
|
|
| Categoria | Cantidad | Modelos que las usan |
|
|
|-----------|----------|---------------------|
|
|
| Price Action | 12 | AMD, Range, TPSL |
|
|
| Volume | 10 | AMD, Range, TPSL |
|
|
| Volatility | 8 | AMD, Range, TPSL |
|
|
| Trend | 10 | AMD, Range, TPSL |
|
|
| Market Structure | 12 | AMD, SMC |
|
|
| Order Flow | 10 | AMD, Liquidity |
|
|
| Liquidity | 8 | Liquidity, TPSL |
|
|
| ICT | 15 | ICT Context, Range |
|
|
| SMC | 12 | AMD, TPSL |
|
|
| Time | 6 | Todos |
|
|
| **TOTAL** | **103** | - |
|
|
|
|
### Resumen de Targets
|
|
|
|
| Modelo | Tipo Target | Clases/Valores | Horizonte |
|
|
|--------|-------------|----------------|-----------|
|
|
| AMDDetector | Multiclass | 4 (neutral, acc, manip, dist) | 20 bars |
|
|
| RangePredictor | Regression | delta_high, delta_low | 15m, 1h |
|
|
| TPSLClassifier | Binary | 0/1 (SL first / TP first) | Variable |
|
|
| LiquidityHunter | Binary | 0/1 (no sweep / sweep) | 10 bars |
|
|
| ICTContextModel | Continuous | 0-1 score | Current |
|
|
|
|
---
|
|
|
|
## Features por Categoria
|
|
|
|
### 1. Price Action Features (12)
|
|
|
|
| # | Feature | Calculo | Tipo | Modelo(s) |
|
|
|---|---------|---------|------|-----------|
|
|
| 1 | `range_ratio` | `(high - low) / SMA(high - low, 20)` | float | AMD, Range |
|
|
| 2 | `range_pct` | `(high - low) / close` | float | AMD, Range |
|
|
| 3 | `body_size` | `abs(close - open) / (high - low + 1e-8)` | float | AMD |
|
|
| 4 | `upper_wick` | `(high - max(close, open)) / (high - low + 1e-8)` | float | AMD |
|
|
| 5 | `lower_wick` | `(min(close, open) - low) / (high - low + 1e-8)` | float | AMD |
|
|
| 6 | `buying_pressure` | `(close - low) / (high - low + 1e-8)` | float | AMD, Range |
|
|
| 7 | `selling_pressure` | `(high - close) / (high - low + 1e-8)` | float | AMD, Range |
|
|
| 8 | `close_position` | `(close - low) / (high - low + 1e-8)` | float | AMD |
|
|
| 9 | `range_expansion` | `range_ratio > 1.3` | binary | AMD |
|
|
| 10 | `range_compression` | `range_ratio < 0.7` | binary | AMD |
|
|
| 11 | `gap_up` | `open > high.shift(1)` | binary | Range |
|
|
| 12 | `gap_down` | `open < low.shift(1)` | binary | Range |
|
|
|
|
```python
|
|
def extract_price_action_features(df):
|
|
"""Extrae features de price action"""
|
|
f = {}
|
|
|
|
hl_range = df['high'] - df['low']
|
|
hl_range_safe = hl_range.replace(0, 1e-8)
|
|
|
|
f['range_ratio'] = hl_range / hl_range.rolling(20).mean()
|
|
f['range_pct'] = hl_range / df['close']
|
|
f['body_size'] = abs(df['close'] - df['open']) / hl_range_safe
|
|
f['upper_wick'] = (df['high'] - df[['close', 'open']].max(axis=1)) / hl_range_safe
|
|
f['lower_wick'] = (df[['close', 'open']].min(axis=1) - df['low']) / hl_range_safe
|
|
f['buying_pressure'] = (df['close'] - df['low']) / hl_range_safe
|
|
f['selling_pressure'] = (df['high'] - df['close']) / hl_range_safe
|
|
f['close_position'] = f['buying_pressure']
|
|
f['range_expansion'] = (f['range_ratio'] > 1.3).astype(int)
|
|
f['range_compression'] = (f['range_ratio'] < 0.7).astype(int)
|
|
f['gap_up'] = (df['open'] > df['high'].shift(1)).astype(int)
|
|
f['gap_down'] = (df['open'] < df['low'].shift(1)).astype(int)
|
|
|
|
return pd.DataFrame(f)
|
|
```
|
|
|
|
### 2. Volume Features (10)
|
|
|
|
| # | Feature | Calculo | Tipo | Modelo(s) |
|
|
|---|---------|---------|------|-----------|
|
|
| 1 | `volume_ratio` | `volume / SMA(volume, 20)` | float | AMD, Range |
|
|
| 2 | `volume_trend` | `SMA(volume, 10) - SMA(volume, 30)` | float | AMD |
|
|
| 3 | `volume_spike` | `volume > SMA(volume, 20) * 2` | binary | AMD |
|
|
| 4 | `obv` | On Balance Volume | float | AMD |
|
|
| 5 | `obv_slope` | `(OBV - OBV.shift(5)) / 5` | float | AMD |
|
|
| 6 | `vwap` | Volume Weighted Average Price | float | Range |
|
|
| 7 | `vwap_distance` | `(close - vwap) / vwap` | float | Range |
|
|
| 8 | `volume_on_up` | `sum(vol if close > open) / total_vol` (20 bars) | float | AMD |
|
|
| 9 | `volume_on_down` | `sum(vol if close < open) / total_vol` (20 bars) | float | AMD |
|
|
| 10 | `volume_imbalance` | `volume_on_up - volume_on_down` | float | AMD |
|
|
|
|
```python
|
|
def extract_volume_features(df):
|
|
"""Extrae features de volumen"""
|
|
f = {}
|
|
|
|
f['volume_ratio'] = df['volume'] / df['volume'].rolling(20).mean()
|
|
f['volume_trend'] = df['volume'].rolling(10).mean() - df['volume'].rolling(30).mean()
|
|
f['volume_spike'] = (df['volume'] > df['volume'].rolling(20).mean() * 2).astype(int)
|
|
|
|
# OBV
|
|
obv_direction = ((df['close'] > df['close'].shift(1)).astype(int) * 2 - 1)
|
|
f['obv'] = (df['volume'] * obv_direction).cumsum()
|
|
f['obv_slope'] = f['obv'].diff(5) / 5
|
|
|
|
# VWAP
|
|
typical_price = (df['high'] + df['low'] + df['close']) / 3
|
|
f['vwap'] = (typical_price * df['volume']).cumsum() / df['volume'].cumsum()
|
|
f['vwap_distance'] = (df['close'] - f['vwap']) / f['vwap']
|
|
|
|
# Volume distribution
|
|
up_bars = (df['close'] > df['open']).astype(int)
|
|
down_bars = (df['close'] < df['open']).astype(int)
|
|
|
|
f['volume_on_up'] = (df['volume'] * up_bars).rolling(20).sum() / df['volume'].rolling(20).sum()
|
|
f['volume_on_down'] = (df['volume'] * down_bars).rolling(20).sum() / df['volume'].rolling(20).sum()
|
|
f['volume_imbalance'] = f['volume_on_up'] - f['volume_on_down']
|
|
|
|
return pd.DataFrame(f)
|
|
```
|
|
|
|
### 3. Volatility Features (8)
|
|
|
|
| # | Feature | Calculo | Tipo | Modelo(s) |
|
|
|---|---------|---------|------|-----------|
|
|
| 1 | `atr` | Average True Range (14) | float | Todos |
|
|
| 2 | `atr_ratio` | `ATR / SMA(ATR, 50)` | float | AMD, Range |
|
|
| 3 | `atr_percentile` | Percentile de ATR (100 bars) | float | Range |
|
|
| 4 | `volatility_10` | `std(returns, 10)` | float | Range |
|
|
| 5 | `volatility_20` | `std(returns, 20)` | float | Range |
|
|
| 6 | `volatility_50` | `std(returns, 50)` | float | Range |
|
|
| 7 | `volatility_ratio` | `volatility_10 / volatility_50` | float | AMD |
|
|
| 8 | `bollinger_width` | `(BB_upper - BB_lower) / BB_middle` | float | Range |
|
|
|
|
```python
|
|
def extract_volatility_features(df):
|
|
"""Extrae features de volatilidad"""
|
|
f = {}
|
|
|
|
# ATR
|
|
tr = pd.concat([
|
|
df['high'] - df['low'],
|
|
abs(df['high'] - df['close'].shift(1)),
|
|
abs(df['low'] - df['close'].shift(1))
|
|
], axis=1).max(axis=1)
|
|
|
|
f['atr'] = tr.rolling(14).mean()
|
|
f['atr_ratio'] = f['atr'] / f['atr'].rolling(50).mean()
|
|
f['atr_percentile'] = f['atr'].rolling(100).apply(
|
|
lambda x: pd.Series(x).rank(pct=True).iloc[-1]
|
|
)
|
|
|
|
# Returns volatility
|
|
returns = df['close'].pct_change()
|
|
f['volatility_10'] = returns.rolling(10).std()
|
|
f['volatility_20'] = returns.rolling(20).std()
|
|
f['volatility_50'] = returns.rolling(50).std()
|
|
f['volatility_ratio'] = f['volatility_10'] / f['volatility_50']
|
|
|
|
# Bollinger Width
|
|
sma_20 = df['close'].rolling(20).mean()
|
|
std_20 = df['close'].rolling(20).std()
|
|
bb_upper = sma_20 + 2 * std_20
|
|
bb_lower = sma_20 - 2 * std_20
|
|
f['bollinger_width'] = (bb_upper - bb_lower) / sma_20
|
|
|
|
return pd.DataFrame(f)
|
|
```
|
|
|
|
### 4. Trend Features (10)
|
|
|
|
| # | Feature | Calculo | Tipo | Modelo(s) |
|
|
|---|---------|---------|------|-----------|
|
|
| 1 | `sma_10` | Simple Moving Average (10) | float | Range |
|
|
| 2 | `sma_20` | Simple Moving Average (20) | float | Range |
|
|
| 3 | `sma_50` | Simple Moving Average (50) | float | Range |
|
|
| 4 | `close_sma_10_ratio` | `close / SMA_10` | float | AMD |
|
|
| 5 | `close_sma_20_ratio` | `close / SMA_20` | float | AMD |
|
|
| 6 | `sma_slope_20` | `(SMA_20 - SMA_20.shift(5)) / 5` | float | AMD |
|
|
| 7 | `trend_strength` | `abs(sma_slope_20) / ATR` | float | AMD |
|
|
| 8 | `adx` | Average Directional Index (14) | float | AMD |
|
|
| 9 | `plus_di` | +DI (14) | float | Range |
|
|
| 10 | `minus_di` | -DI (14) | float | Range |
|
|
|
|
```python
|
|
def extract_trend_features(df):
|
|
"""Extrae features de tendencia"""
|
|
f = {}
|
|
|
|
# SMAs
|
|
f['sma_10'] = df['close'].rolling(10).mean()
|
|
f['sma_20'] = df['close'].rolling(20).mean()
|
|
f['sma_50'] = df['close'].rolling(50).mean()
|
|
|
|
f['close_sma_10_ratio'] = df['close'] / f['sma_10']
|
|
f['close_sma_20_ratio'] = df['close'] / f['sma_20']
|
|
f['sma_slope_20'] = f['sma_20'].diff(5) / 5
|
|
|
|
# Trend strength
|
|
atr = calculate_atr(df, 14)
|
|
f['trend_strength'] = abs(f['sma_slope_20']) / atr
|
|
|
|
# ADX calculation
|
|
f['adx'], f['plus_di'], f['minus_di'] = calculate_adx(df, 14)
|
|
|
|
return pd.DataFrame(f)
|
|
|
|
def calculate_adx(df, period=14):
|
|
"""Calcula ADX, +DI, -DI"""
|
|
plus_dm = df['high'].diff()
|
|
minus_dm = -df['low'].diff()
|
|
|
|
plus_dm = plus_dm.where((plus_dm > minus_dm) & (plus_dm > 0), 0)
|
|
minus_dm = minus_dm.where((minus_dm > plus_dm) & (minus_dm > 0), 0)
|
|
|
|
tr = pd.concat([
|
|
df['high'] - df['low'],
|
|
abs(df['high'] - df['close'].shift(1)),
|
|
abs(df['low'] - df['close'].shift(1))
|
|
], axis=1).max(axis=1)
|
|
|
|
atr = tr.rolling(period).mean()
|
|
plus_di = 100 * (plus_dm.rolling(period).mean() / atr)
|
|
minus_di = 100 * (minus_dm.rolling(period).mean() / atr)
|
|
|
|
dx = 100 * abs(plus_di - minus_di) / (plus_di + minus_di)
|
|
adx = dx.rolling(period).mean()
|
|
|
|
return adx, plus_di, minus_di
|
|
```
|
|
|
|
### 5. Market Structure Features (12)
|
|
|
|
| # | Feature | Calculo | Tipo | Modelo(s) |
|
|
|---|---------|---------|------|-----------|
|
|
| 1 | `higher_highs_count` | Count HH en 20 bars | int | AMD |
|
|
| 2 | `higher_lows_count` | Count HL en 20 bars | int | AMD |
|
|
| 3 | `lower_highs_count` | Count LH en 20 bars | int | AMD |
|
|
| 4 | `lower_lows_count` | Count LL en 20 bars | int | AMD |
|
|
| 5 | `swing_high_distance` | Distancia al swing high mas cercano | float | Liquidity |
|
|
| 6 | `swing_low_distance` | Distancia al swing low mas cercano | float | Liquidity |
|
|
| 7 | `bos_bullish_count` | Count BOS alcista en 30 bars | int | SMC |
|
|
| 8 | `bos_bearish_count` | Count BOS bajista en 30 bars | int | SMC |
|
|
| 9 | `choch_bullish_count` | Count CHOCH alcista en 30 bars | int | SMC |
|
|
| 10 | `choch_bearish_count` | Count CHOCH bajista en 30 bars | int | SMC |
|
|
| 11 | `structure_score` | Score de estructura (-1 a +1) | float | AMD |
|
|
| 12 | `structure_alignment` | Alineacion con tendencia | binary | Range |
|
|
|
|
```python
|
|
def extract_market_structure_features(df, lookback=20):
|
|
"""Extrae features de estructura de mercado"""
|
|
f = {}
|
|
|
|
# Higher highs/lows, Lower highs/lows
|
|
f['higher_highs_count'] = (df['high'] > df['high'].shift(1)).rolling(lookback).sum()
|
|
f['higher_lows_count'] = (df['low'] > df['low'].shift(1)).rolling(lookback).sum()
|
|
f['lower_highs_count'] = (df['high'] < df['high'].shift(1)).rolling(lookback).sum()
|
|
f['lower_lows_count'] = (df['low'] < df['low'].shift(1)).rolling(lookback).sum()
|
|
|
|
# Swing distances
|
|
swing_highs = detect_swing_points(df, 'high', lookback)
|
|
swing_lows = detect_swing_points(df, 'low', lookback)
|
|
|
|
f['swing_high_distance'] = calculate_distance_to_nearest(df['close'], swing_highs, 'above')
|
|
f['swing_low_distance'] = calculate_distance_to_nearest(df['close'], swing_lows, 'below')
|
|
|
|
# BOS and CHOCH counts
|
|
bos_signals = detect_bos(df, lookback)
|
|
choch_signals = detect_choch(df, lookback)
|
|
|
|
f['bos_bullish_count'] = count_signals(bos_signals, 'bullish', 30)
|
|
f['bos_bearish_count'] = count_signals(bos_signals, 'bearish', 30)
|
|
f['choch_bullish_count'] = count_signals(choch_signals, 'bullish', 30)
|
|
f['choch_bearish_count'] = count_signals(choch_signals, 'bearish', 30)
|
|
|
|
# Structure score
|
|
bullish_points = f['higher_highs_count'] + f['higher_lows_count']
|
|
bearish_points = f['lower_highs_count'] + f['lower_lows_count']
|
|
total_points = bullish_points + bearish_points + 1e-8
|
|
f['structure_score'] = (bullish_points - bearish_points) / total_points
|
|
|
|
# Structure alignment
|
|
trend_direction = np.sign(df['close'].rolling(20).mean().diff(5))
|
|
f['structure_alignment'] = (np.sign(f['structure_score']) == trend_direction).astype(int)
|
|
|
|
return pd.DataFrame(f)
|
|
```
|
|
|
|
### 6. Order Flow Features (10)
|
|
|
|
| # | Feature | Calculo | Tipo | Modelo(s) |
|
|
|---|---------|---------|------|-----------|
|
|
| 1 | `order_blocks_bullish` | Count OB bullish en 30 bars | int | AMD |
|
|
| 2 | `order_blocks_bearish` | Count OB bearish en 30 bars | int | AMD |
|
|
| 3 | `ob_net` | `OB_bullish - OB_bearish` | int | AMD |
|
|
| 4 | `fvg_bullish_count` | Count FVG bullish sin rellenar | int | Range |
|
|
| 5 | `fvg_bearish_count` | Count FVG bearish sin rellenar | int | Range |
|
|
| 6 | `fvg_nearest_distance` | Distancia a FVG mas cercano | float | Range |
|
|
| 7 | `false_breakout_count` | Count falsas rupturas en 30 bars | int | AMD |
|
|
| 8 | `whipsaw_intensity` | Frecuencia de reversiones rapidas | float | AMD |
|
|
| 9 | `reversal_count` | Count reversiones en 20 bars | int | AMD |
|
|
| 10 | `displacement_strength` | Fuerza del ultimo displacement | float | SMC |
|
|
|
|
```python
|
|
def extract_order_flow_features(df, lookback=30):
|
|
"""Extrae features de order flow"""
|
|
f = {}
|
|
|
|
# Order blocks
|
|
ob_bullish = identify_order_blocks(df, 'bullish')
|
|
ob_bearish = identify_order_blocks(df, 'bearish')
|
|
|
|
f['order_blocks_bullish'] = count_recent(ob_bullish, lookback)
|
|
f['order_blocks_bearish'] = count_recent(ob_bearish, lookback)
|
|
f['ob_net'] = f['order_blocks_bullish'] - f['order_blocks_bearish']
|
|
|
|
# Fair Value Gaps
|
|
fvg_bullish = identify_fvg(df, 'bullish')
|
|
fvg_bearish = identify_fvg(df, 'bearish')
|
|
|
|
f['fvg_bullish_count'] = count_unfilled_fvg(fvg_bullish, df['close'])
|
|
f['fvg_bearish_count'] = count_unfilled_fvg(fvg_bearish, df['close'])
|
|
f['fvg_nearest_distance'] = calculate_nearest_fvg_distance(df, fvg_bullish + fvg_bearish)
|
|
|
|
# False breakouts and whipsaws
|
|
f['false_breakout_count'] = count_false_breakouts(df, lookback)
|
|
f['whipsaw_intensity'] = calculate_whipsaw_intensity(df, lookback)
|
|
|
|
# Reversals
|
|
price_changes = df['close'].pct_change()
|
|
reversals = ((price_changes > 0.005) & (price_changes.shift(-1) < -0.005)) | \
|
|
((price_changes < -0.005) & (price_changes.shift(-1) > 0.005))
|
|
f['reversal_count'] = reversals.rolling(20).sum()
|
|
|
|
# Displacement
|
|
f['displacement_strength'] = calculate_displacement_strength(df)
|
|
|
|
return pd.DataFrame(f)
|
|
```
|
|
|
|
### 7. Liquidity Features (8)
|
|
|
|
| # | Feature | Calculo | Tipo | Modelo(s) |
|
|
|---|---------|---------|------|-----------|
|
|
| 1 | `bsl_distance` | Distancia a Buy Side Liquidity | float | Liquidity |
|
|
| 2 | `ssl_distance` | Distancia a Sell Side Liquidity | float | Liquidity |
|
|
| 3 | `bsl_strength` | Numero de stops acumulados arriba | int | Liquidity |
|
|
| 4 | `ssl_strength` | Numero de stops acumulados abajo | int | Liquidity |
|
|
| 5 | `liquidity_grab_count` | Count grabs recientes (20 bars) | int | AMD |
|
|
| 6 | `time_since_bsl_sweep` | Bars desde ultimo BSL sweep | int | Liquidity |
|
|
| 7 | `time_since_ssl_sweep` | Bars desde ultimo SSL sweep | int | Liquidity |
|
|
| 8 | `liquidity_imbalance` | `(BSL_strength - SSL_strength) / total` | float | Liquidity |
|
|
|
|
```python
|
|
def extract_liquidity_features(df, lookback=20):
|
|
"""Extrae features de liquidez"""
|
|
f = {}
|
|
|
|
# Identify liquidity pools
|
|
swing_highs = df['high'].rolling(lookback, center=True).max()
|
|
swing_lows = df['low'].rolling(lookback, center=True).min()
|
|
|
|
# Distances to liquidity
|
|
f['bsl_distance'] = (swing_highs - df['close']) / df['close']
|
|
f['ssl_distance'] = (df['close'] - swing_lows) / df['close']
|
|
|
|
# Liquidity strength (number of swing points)
|
|
f['bsl_strength'] = count_swing_points_above(df, lookback)
|
|
f['ssl_strength'] = count_swing_points_below(df, lookback)
|
|
|
|
# Liquidity grabs
|
|
f['liquidity_grab_count'] = count_liquidity_grabs(df, lookback)
|
|
|
|
# Time since sweeps
|
|
bsl_sweeps = detect_bsl_sweeps(df)
|
|
ssl_sweeps = detect_ssl_sweeps(df)
|
|
|
|
f['time_since_bsl_sweep'] = bars_since_last(bsl_sweeps)
|
|
f['time_since_ssl_sweep'] = bars_since_last(ssl_sweeps)
|
|
|
|
# Liquidity imbalance
|
|
total_liquidity = f['bsl_strength'] + f['ssl_strength'] + 1e-8
|
|
f['liquidity_imbalance'] = (f['bsl_strength'] - f['ssl_strength']) / total_liquidity
|
|
|
|
return pd.DataFrame(f)
|
|
```
|
|
|
|
### 8. ICT Features (15)
|
|
|
|
| # | Feature | Calculo | Tipo | Modelo(s) |
|
|
|---|---------|---------|------|-----------|
|
|
| 1 | `ote_position` | Posicion en Fibonacci (0-1) | float | ICT |
|
|
| 2 | `in_discount_zone` | Precio en 21-38% Fib | binary | ICT |
|
|
| 3 | `in_premium_zone` | Precio en 62-79% Fib | binary | ICT |
|
|
| 4 | `in_ote_buy_zone` | Zona optima compra (62-79%) | binary | ICT |
|
|
| 5 | `in_ote_sell_zone` | Zona optima venta (21-38%) | binary | ICT |
|
|
| 6 | `is_london_kz` | En London Open Killzone | binary | ICT |
|
|
| 7 | `is_ny_kz` | En NY AM Killzone | binary | ICT |
|
|
| 8 | `is_asian_kz` | En Asian Killzone | binary | ICT |
|
|
| 9 | `killzone_strength` | Fuerza de la sesion (0-1) | float | ICT |
|
|
| 10 | `session_overlap` | En overlap London/NY | binary | ICT |
|
|
| 11 | `weekly_range_position` | Posicion en rango semanal | float | ICT |
|
|
| 12 | `daily_range_position` | Posicion en rango diario | float | ICT |
|
|
| 13 | `mmsm_detected` | Market Maker Sell Model | binary | ICT |
|
|
| 14 | `mmbm_detected` | Market Maker Buy Model | binary | ICT |
|
|
| 15 | `po3_phase` | Power of 3 phase (1-3) | int | ICT |
|
|
|
|
```python
|
|
def extract_ict_features(df, timestamps):
|
|
"""Extrae features ICT"""
|
|
f = {}
|
|
|
|
# OTE zones
|
|
swing_high = df['high'].rolling(50).max()
|
|
swing_low = df['low'].rolling(50).min()
|
|
range_size = swing_high - swing_low
|
|
|
|
f['ote_position'] = (df['close'] - swing_low) / range_size
|
|
f['in_discount_zone'] = ((f['ote_position'] >= 0.21) & (f['ote_position'] <= 0.38)).astype(int)
|
|
f['in_premium_zone'] = ((f['ote_position'] >= 0.62) & (f['ote_position'] <= 0.79)).astype(int)
|
|
f['in_ote_buy_zone'] = f['in_discount_zone']
|
|
f['in_ote_sell_zone'] = f['in_premium_zone']
|
|
|
|
# Killzones
|
|
killzones = identify_killzones(timestamps)
|
|
f['is_london_kz'] = (killzones == 'london_open').astype(int)
|
|
f['is_ny_kz'] = (killzones == 'ny_am').astype(int)
|
|
f['is_asian_kz'] = (killzones == 'asian').astype(int)
|
|
|
|
f['killzone_strength'] = get_killzone_strength(killzones)
|
|
f['session_overlap'] = ((killzones == 'london_close') | (killzones == 'ny_am')).astype(int)
|
|
|
|
# Range positions
|
|
f['weekly_range_position'] = calculate_weekly_position(df)
|
|
f['daily_range_position'] = calculate_daily_position(df)
|
|
|
|
# Market Maker Models
|
|
f['mmsm_detected'] = detect_mmsm(df)
|
|
f['mmbm_detected'] = detect_mmbm(df)
|
|
|
|
# Power of 3
|
|
f['po3_phase'] = calculate_po3_phase(df, timestamps)
|
|
|
|
return pd.DataFrame(f)
|
|
```
|
|
|
|
### 9. SMC Features (12)
|
|
|
|
| # | Feature | Calculo | Tipo | Modelo(s) |
|
|
|---|---------|---------|------|-----------|
|
|
| 1 | `choch_bullish_recent` | CHOCH bullish en 30 bars | binary | SMC |
|
|
| 2 | `choch_bearish_recent` | CHOCH bearish en 30 bars | binary | SMC |
|
|
| 3 | `bos_bullish_recent` | BOS bullish en 30 bars | binary | SMC |
|
|
| 4 | `bos_bearish_recent` | BOS bearish en 30 bars | binary | SMC |
|
|
| 5 | `inducement_bullish` | Inducement bullish detectado | binary | SMC |
|
|
| 6 | `inducement_bearish` | Inducement bearish detectado | binary | SMC |
|
|
| 7 | `displacement_bullish` | Displacement bullish reciente | binary | SMC |
|
|
| 8 | `displacement_bearish` | Displacement bearish reciente | binary | SMC |
|
|
| 9 | `liquidity_void_distance` | Distancia a void mas cercano | float | SMC |
|
|
| 10 | `structure_bullish_score` | Score estructura alcista | float | SMC |
|
|
| 11 | `structure_bearish_score` | Score estructura bajista | float | SMC |
|
|
| 12 | `smc_confluence_score` | Score de confluence SMC | float | SMC |
|
|
|
|
```python
|
|
def extract_smc_features(df, lookback=30):
|
|
"""Extrae features SMC"""
|
|
f = {}
|
|
|
|
# CHOCH
|
|
choch_signals = detect_choch(df, lookback)
|
|
f['choch_bullish_recent'] = has_recent_signal(choch_signals, 'bullish', 30)
|
|
f['choch_bearish_recent'] = has_recent_signal(choch_signals, 'bearish', 30)
|
|
|
|
# BOS
|
|
bos_signals = detect_bos(df, lookback)
|
|
f['bos_bullish_recent'] = has_recent_signal(bos_signals, 'bullish', 30)
|
|
f['bos_bearish_recent'] = has_recent_signal(bos_signals, 'bearish', 30)
|
|
|
|
# Inducement
|
|
inducements = detect_inducement(df)
|
|
f['inducement_bullish'] = has_recent_signal(inducements, 'bullish', 20)
|
|
f['inducement_bearish'] = has_recent_signal(inducements, 'bearish', 20)
|
|
|
|
# Displacement
|
|
displacements = detect_displacement(df)
|
|
f['displacement_bullish'] = has_recent_signal(displacements, 'bullish', 10)
|
|
f['displacement_bearish'] = has_recent_signal(displacements, 'bearish', 10)
|
|
|
|
# Liquidity voids
|
|
voids = detect_liquidity_voids(df)
|
|
f['liquidity_void_distance'] = calculate_nearest_void_distance(df['close'], voids)
|
|
|
|
# Structure scores
|
|
f['structure_bullish_score'] = calculate_bullish_structure_score(df)
|
|
f['structure_bearish_score'] = calculate_bearish_structure_score(df)
|
|
|
|
# SMC Confluence
|
|
f['smc_confluence_score'] = calculate_smc_confluence(f)
|
|
|
|
return pd.DataFrame(f)
|
|
```
|
|
|
|
### 10. Time Features (6)
|
|
|
|
| # | Feature | Calculo | Tipo | Modelo(s) |
|
|
|---|---------|---------|------|-----------|
|
|
| 1 | `hour_sin` | `sin(2 * pi * hour / 24)` | float | Todos |
|
|
| 2 | `hour_cos` | `cos(2 * pi * hour / 24)` | float | Todos |
|
|
| 3 | `day_of_week` | Dia de la semana (0-6) | int | Todos |
|
|
| 4 | `is_weekend` | Sabado o Domingo | binary | Todos |
|
|
| 5 | `time_in_session` | Minutos desde inicio sesion | int | ICT |
|
|
| 6 | `minutes_to_close` | Minutos hasta cierre sesion | int | ICT |
|
|
|
|
```python
|
|
def extract_time_features(timestamps):
|
|
"""Extrae features temporales"""
|
|
f = {}
|
|
|
|
hours = timestamps.hour
|
|
f['hour_sin'] = np.sin(2 * np.pi * hours / 24)
|
|
f['hour_cos'] = np.cos(2 * np.pi * hours / 24)
|
|
f['day_of_week'] = timestamps.dayofweek
|
|
f['is_weekend'] = (timestamps.dayofweek >= 5).astype(int)
|
|
|
|
# Session timing
|
|
f['time_in_session'] = calculate_time_in_session(timestamps)
|
|
f['minutes_to_close'] = calculate_minutes_to_close(timestamps)
|
|
|
|
return pd.DataFrame(f)
|
|
```
|
|
|
|
---
|
|
|
|
## Targets por Modelo
|
|
|
|
### 1. AMDDetector Target
|
|
|
|
**Tipo:** Multiclass Classification (4 clases)
|
|
|
|
| Clase | Valor | Descripcion |
|
|
|-------|-------|-------------|
|
|
| Neutral | 0 | Sin fase clara definida |
|
|
| Accumulation | 1 | Fase de acumulacion |
|
|
| Manipulation | 2 | Fase de manipulacion |
|
|
| Distribution | 3 | Fase de distribucion |
|
|
|
|
**Metodo de Labeling:**
|
|
|
|
```python
|
|
def label_amd_phase(df, i, forward_window=20):
|
|
"""
|
|
Etiqueta la fase AMD basada en comportamiento futuro
|
|
|
|
Criterios:
|
|
- Accumulation: Rango estrecho + precio sube despues
|
|
- Manipulation: Falsas rupturas + whipsaws
|
|
- Distribution: Volumen en caidas + precio baja despues
|
|
- Neutral: No cumple ninguno claramente
|
|
"""
|
|
if i + forward_window >= len(df):
|
|
return 0 # neutral
|
|
|
|
future = df.iloc[i:i+forward_window]
|
|
current_price = df['close'].iloc[i]
|
|
|
|
# Metricas del futuro
|
|
price_range_pct = (future['high'].max() - future['low'].min()) / current_price
|
|
final_price = future['close'].iloc[-1]
|
|
price_change = (final_price - current_price) / current_price
|
|
|
|
# Volumen
|
|
volume_first_half = future['volume'].iloc[:10].mean()
|
|
volume_second_half = future['volume'].iloc[10:].mean()
|
|
|
|
# False breakouts
|
|
false_breaks = count_false_breakouts_forward(df, i, forward_window)
|
|
|
|
# ACCUMULATION criteria
|
|
if price_range_pct < 0.02: # Rango < 2%
|
|
if price_change > 0.01: # Sube > 1% despues
|
|
if volume_second_half < volume_first_half: # Volumen decreciente
|
|
return 1 # accumulation
|
|
|
|
# MANIPULATION criteria
|
|
if false_breaks >= 2: # 2+ falsas rupturas
|
|
whipsaw_count = count_whipsaws_forward(df, i, forward_window)
|
|
if whipsaw_count >= 3:
|
|
return 2 # manipulation
|
|
|
|
# DISTRIBUTION criteria
|
|
if price_change < -0.015: # Cae > 1.5%
|
|
# Volumen alto en caidas
|
|
down_volume = calculate_volume_on_down_moves(future)
|
|
if down_volume > 0.6: # 60%+ volumen en caidas
|
|
return 3 # distribution
|
|
|
|
return 0 # neutral
|
|
|
|
def count_false_breakouts_forward(df, i, window):
|
|
"""Cuenta falsas rupturas en ventana futura"""
|
|
future = df.iloc[i:i+window]
|
|
resistance = df['high'].iloc[max(0,i-20):i].max()
|
|
support = df['low'].iloc[max(0,i-20):i].min()
|
|
|
|
false_breaks = 0
|
|
for j in range(1, len(future)):
|
|
# False breakout above
|
|
if future['high'].iloc[j] > resistance * 1.005:
|
|
if future['close'].iloc[j] < resistance:
|
|
false_breaks += 1
|
|
# False breakdown below
|
|
if future['low'].iloc[j] < support * 0.995:
|
|
if future['close'].iloc[j] > support:
|
|
false_breaks += 1
|
|
|
|
return false_breaks
|
|
```
|
|
|
|
**Balance de Clases Esperado:**
|
|
- Neutral: ~40%
|
|
- Accumulation: ~20%
|
|
- Manipulation: ~20%
|
|
- Distribution: ~20%
|
|
|
|
### 2. RangePredictor Target
|
|
|
|
**Tipo:** Regression (continuo) + Binned Classification
|
|
|
|
**Targets de Regresion:**
|
|
|
|
| Target | Calculo | Horizonte |
|
|
|--------|---------|-----------|
|
|
| `delta_high_15m` | `(max_high_3bars - close) / close` | 15 min |
|
|
| `delta_low_15m` | `(close - min_low_3bars) / close` | 15 min |
|
|
| `delta_high_1h` | `(max_high_12bars - close) / close` | 1 hora |
|
|
| `delta_low_1h` | `(close - min_low_12bars) / close` | 1 hora |
|
|
|
|
**Targets Binned:**
|
|
|
|
| Bin | Rango (ATR multiple) | Descripcion |
|
|
|-----|---------------------|-------------|
|
|
| 0 | < 0.3 ATR | Muy bajo |
|
|
| 1 | 0.3 - 0.7 ATR | Bajo |
|
|
| 2 | 0.7 - 1.2 ATR | Medio |
|
|
| 3 | > 1.2 ATR | Alto |
|
|
|
|
```python
|
|
def calculate_range_targets(df, horizons={'15m': 3, '1h': 12}):
|
|
"""
|
|
Calcula targets para RangePredictor
|
|
"""
|
|
targets = {}
|
|
atr = calculate_atr(df, 14)
|
|
|
|
for name, periods in horizons.items():
|
|
# Regression targets
|
|
future_high = df['high'].rolling(periods).max().shift(-periods)
|
|
future_low = df['low'].rolling(periods).min().shift(-periods)
|
|
|
|
targets[f'delta_high_{name}'] = (future_high - df['close']) / df['close']
|
|
targets[f'delta_low_{name}'] = (df['close'] - future_low) / df['close']
|
|
|
|
# Binned targets
|
|
for target_type in ['high', 'low']:
|
|
delta = targets[f'delta_{target_type}_{name}']
|
|
atr_ratio = delta / (atr / df['close'])
|
|
|
|
bins = pd.cut(
|
|
atr_ratio,
|
|
bins=[-np.inf, 0.3, 0.7, 1.2, np.inf],
|
|
labels=[0, 1, 2, 3]
|
|
)
|
|
targets[f'bin_{target_type}_{name}'] = bins
|
|
|
|
return pd.DataFrame(targets)
|
|
```
|
|
|
|
### 3. TPSLClassifier Target
|
|
|
|
**Tipo:** Binary Classification
|
|
|
|
| Valor | Descripcion |
|
|
|-------|-------------|
|
|
| 0 | Stop Loss toca primero |
|
|
| 1 | Take Profit toca primero |
|
|
|
|
**Configuraciones R:R:**
|
|
|
|
| Config | SL Distance | TP Distance | R:R |
|
|
|--------|-------------|-------------|-----|
|
|
| `rr_2_1` | 0.3 ATR | 0.6 ATR | 2:1 |
|
|
| `rr_3_1` | 0.3 ATR | 0.9 ATR | 3:1 |
|
|
| `rr_4_1` | 0.25 ATR | 1.0 ATR | 4:1 |
|
|
|
|
```python
|
|
def calculate_tpsl_targets(df, horizons, rr_configs):
|
|
"""
|
|
Calcula targets para TPSLClassifier
|
|
|
|
Returns 1 si TP toca primero, 0 si SL toca primero, NaN si ninguno
|
|
"""
|
|
targets = {}
|
|
atr = calculate_atr(df, 14)
|
|
|
|
for horizon_name, max_bars in horizons.items():
|
|
for rr in rr_configs:
|
|
target_name = f'tp_first_{horizon_name}_{rr["name"]}'
|
|
|
|
sl_distance = atr * rr['sl_atr_multiple']
|
|
tp_distance = atr * rr['tp_atr_multiple']
|
|
|
|
results = []
|
|
for i in range(len(df)):
|
|
if i + max_bars >= len(df):
|
|
results.append(np.nan)
|
|
continue
|
|
|
|
entry_price = df['close'].iloc[i]
|
|
sl_price = entry_price - sl_distance.iloc[i]
|
|
tp_price = entry_price + tp_distance.iloc[i]
|
|
|
|
# Simular hacia adelante
|
|
result = simulate_trade_outcome(
|
|
df.iloc[i+1:i+max_bars+1],
|
|
entry_price,
|
|
sl_price,
|
|
tp_price
|
|
)
|
|
results.append(result)
|
|
|
|
targets[target_name] = results
|
|
|
|
return pd.DataFrame(targets)
|
|
|
|
def simulate_trade_outcome(future_bars, entry, sl, tp):
|
|
"""
|
|
Simula resultado del trade
|
|
Returns: 1 (TP first), 0 (SL first), NaN (neither)
|
|
"""
|
|
for _, row in future_bars.iterrows():
|
|
# Check SL first (assuming worst case)
|
|
if row['low'] <= sl:
|
|
return 0
|
|
# Check TP
|
|
if row['high'] >= tp:
|
|
return 1
|
|
|
|
return np.nan # Neither hit within window
|
|
```
|
|
|
|
### 4. LiquidityHunter Target
|
|
|
|
**Tipo:** Binary Classification
|
|
|
|
| Valor | Descripcion |
|
|
|-------|-------------|
|
|
| 0 | No hay liquidity sweep |
|
|
| 1 | Hay liquidity sweep |
|
|
|
|
**Tipos de Sweep:**
|
|
|
|
| Target | Descripcion |
|
|
|--------|-------------|
|
|
| `bsl_sweep` | Sweep de Buy Side Liquidity |
|
|
| `ssl_sweep` | Sweep de Sell Side Liquidity |
|
|
| `any_sweep` | Cualquier sweep |
|
|
|
|
```python
|
|
def calculate_liquidity_targets(df, forward_window=10, sweep_threshold=0.005):
|
|
"""
|
|
Calcula targets para LiquidityHunter
|
|
"""
|
|
targets = {}
|
|
|
|
for i in range(len(df) - forward_window):
|
|
# Current liquidity levels
|
|
lookback = df.iloc[max(0, i-20):i]
|
|
swing_high = lookback['high'].max()
|
|
swing_low = lookback['low'].min()
|
|
|
|
# Future price action
|
|
future = df.iloc[i:i+forward_window]
|
|
|
|
# BSL sweep (price goes above swing high then reverses)
|
|
bsl_level = swing_high * (1 + sweep_threshold)
|
|
bsl_swept = (future['high'] >= bsl_level).any()
|
|
bsl_reversed = bsl_swept and (future['close'].iloc[-1] < swing_high)
|
|
|
|
# SSL sweep (price goes below swing low then reverses)
|
|
ssl_level = swing_low * (1 - sweep_threshold)
|
|
ssl_swept = (future['low'] <= ssl_level).any()
|
|
ssl_reversed = ssl_swept and (future['close'].iloc[-1] > swing_low)
|
|
|
|
targets.setdefault('bsl_sweep', []).append(1 if bsl_reversed else 0)
|
|
targets.setdefault('ssl_sweep', []).append(1 if ssl_reversed else 0)
|
|
targets.setdefault('any_sweep', []).append(1 if (bsl_reversed or ssl_reversed) else 0)
|
|
|
|
# Padding para ultimos indices
|
|
for key in targets:
|
|
targets[key].extend([np.nan] * forward_window)
|
|
|
|
return pd.DataFrame(targets)
|
|
```
|
|
|
|
### 5. ICTContextModel Target
|
|
|
|
**Tipo:** Continuous Score (0-1)
|
|
|
|
Este modelo no tiene un target tradicional sino que calcula un score en tiempo real basado en contexto ICT.
|
|
|
|
```python
|
|
def calculate_ict_context_score(df, timestamps):
|
|
"""
|
|
Calcula score de contexto ICT (0-1)
|
|
|
|
Factores:
|
|
- Killzone strength (40%)
|
|
- OTE position alignment (30%)
|
|
- Range position (20%)
|
|
- MM model detection (10%)
|
|
"""
|
|
score = 0.0
|
|
|
|
# Killzone
|
|
killzone = identify_killzone(timestamps.iloc[-1])
|
|
kz_strength = get_killzone_strength(killzone)
|
|
score += 0.40 * kz_strength
|
|
|
|
# OTE alignment
|
|
ote_pos = calculate_ote_position(df)
|
|
if ote_pos < 0.38: # Discount
|
|
ote_alignment = 0.38 - ote_pos # Better if lower
|
|
elif ote_pos > 0.62: # Premium
|
|
ote_alignment = ote_pos - 0.62 # Better if higher
|
|
else:
|
|
ote_alignment = 0 # Near equilibrium
|
|
score += 0.30 * min(ote_alignment * 3, 1.0)
|
|
|
|
# Range position
|
|
daily_pos = calculate_daily_range_position(df)
|
|
range_score = abs(daily_pos - 0.5) * 2 # Better at extremes
|
|
score += 0.20 * range_score
|
|
|
|
# MM model
|
|
mm_model = detect_market_maker_model(df)
|
|
if mm_model['model'] != 'none':
|
|
score += 0.10 * mm_model['confidence']
|
|
|
|
return score
|
|
```
|
|
|
|
---
|
|
|
|
## Feature Engineering Pipeline
|
|
|
|
### Pipeline Completo
|
|
|
|
```python
|
|
class FeatureEngineeringPipeline:
|
|
"""Pipeline completo de feature engineering"""
|
|
|
|
def __init__(self, config=None):
|
|
self.config = config or self._default_config()
|
|
self.scaler = RobustScaler()
|
|
self.feature_names = []
|
|
|
|
def fit_transform(self, df, timestamps=None):
|
|
"""Extrae y normaliza todas las features"""
|
|
|
|
# 1. Extract all feature groups
|
|
price_features = extract_price_action_features(df)
|
|
volume_features = extract_volume_features(df)
|
|
volatility_features = extract_volatility_features(df)
|
|
trend_features = extract_trend_features(df)
|
|
structure_features = extract_market_structure_features(df)
|
|
order_flow_features = extract_order_flow_features(df)
|
|
liquidity_features = extract_liquidity_features(df)
|
|
|
|
if timestamps is not None:
|
|
ict_features = extract_ict_features(df, timestamps)
|
|
time_features = extract_time_features(timestamps)
|
|
else:
|
|
ict_features = pd.DataFrame()
|
|
time_features = pd.DataFrame()
|
|
|
|
smc_features = extract_smc_features(df)
|
|
|
|
# 2. Combine all features
|
|
all_features = pd.concat([
|
|
price_features,
|
|
volume_features,
|
|
volatility_features,
|
|
trend_features,
|
|
structure_features,
|
|
order_flow_features,
|
|
liquidity_features,
|
|
ict_features,
|
|
smc_features,
|
|
time_features
|
|
], axis=1)
|
|
|
|
# 3. Handle NaN
|
|
all_features = all_features.fillna(method='ffill').fillna(0)
|
|
|
|
# 4. Store feature names
|
|
self.feature_names = all_features.columns.tolist()
|
|
|
|
# 5. Scale features
|
|
scaled_features = self.scaler.fit_transform(all_features)
|
|
|
|
return scaled_features
|
|
|
|
def transform(self, df, timestamps=None):
|
|
"""Transform solo (usa scaler ya ajustado)"""
|
|
# ... same extraction ...
|
|
return self.scaler.transform(all_features)
|
|
|
|
def get_feature_importance(self, model, top_n=20):
|
|
"""Obtiene importancia de features"""
|
|
importance = pd.DataFrame({
|
|
'feature': self.feature_names,
|
|
'importance': model.feature_importances_
|
|
}).sort_values('importance', ascending=False)
|
|
|
|
return importance.head(top_n)
|
|
```
|
|
|
|
---
|
|
|
|
## Validacion y Testing
|
|
|
|
### Metricas por Modelo
|
|
|
|
| Modelo | Metrica Principal | Target | Metricas Secundarias |
|
|
|--------|-------------------|--------|---------------------|
|
|
| AMDDetector | Accuracy | >70% | F1 macro >0.65, Precision por clase >60% |
|
|
| RangePredictor | MAE | <0.003 | R2 >0.3, Directional Acc >90% |
|
|
| TPSLClassifier | AUC | >0.85 | Accuracy >80%, Precision >75% |
|
|
| LiquidityHunter | Precision | >70% | Recall >60%, F1 >0.65 |
|
|
| ICTContextModel | - | - | Validacion cualitativa |
|
|
|
|
### Validacion Temporal
|
|
|
|
```python
|
|
def temporal_validation(model, X, y, n_splits=5):
|
|
"""
|
|
Validacion respetando orden temporal
|
|
"""
|
|
tscv = TimeSeriesSplit(n_splits=n_splits)
|
|
scores = []
|
|
|
|
for fold, (train_idx, val_idx) in enumerate(tscv.split(X)):
|
|
X_train, X_val = X[train_idx], X[val_idx]
|
|
y_train, y_val = y[train_idx], y[val_idx]
|
|
|
|
model.fit(X_train, y_train)
|
|
y_pred = model.predict(X_val)
|
|
|
|
score = calculate_metrics(y_val, y_pred)
|
|
scores.append(score)
|
|
|
|
return np.mean(scores), np.std(scores)
|
|
```
|
|
|
|
---
|
|
|
|
**Documento Generado:** 2025-12-08
|
|
**Trading Strategist - Trading Platform**
|