trading-platform/docs/02-definicion-modulos/OQI-006-ml-signals/estrategias/FEATURES-TARGETS-COMPLETO.md
rckrdmrd c1b5081208 feat(ml): Complete FASE 11 - BTCUSD update and comprehensive documentation alignment
ML Engine Updates:
- Updated BTCUSD with Polygon API data (2024-2025): 215,699 new records
- Re-trained all ML models: Attention (R²: 0.223), Base, Metamodel (87.3% confidence)
- Backtest results: +176.71R profit with aggressive_filter strategy

Documentation Consolidation:
- Created docs/99-analisis/_MAP.md index with 13 new analysis documents
- Consolidated inventories: removed duplicates from orchestration/inventarios/
- Updated ML_INVENTORY.yml with BTCUSD metrics and training results
- Added execution reports: FASE11-BTCUSD, correction issues, alignment validation

Architecture & Integration:
- Updated all module documentation with NEXUS v3.4 frontmatter
- Fixed _MAP.md indexes across all folders
- Updated orchestration plans and traces

Files: 229 changed, 5064 insertions(+), 1872 deletions(-)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 09:31:29 -06:00

994 lines
35 KiB
Markdown

---
id: "FEATURES-TARGETS-COMPLETO"
title: "Features y Targets Completos - Modelos ML Trading Platform"
type: "Documentation"
project: "trading-platform"
version: "1.0.0"
updated_date: "2026-01-04"
---
# Features y Targets Completos - Modelos ML Trading Platform
**Version:** 2.0.0
**Fecha:** 2025-12-08
**Modulo:** OQI-006-ml-signals
**Autor:** Trading Strategist - Trading Platform
---
## Tabla de Contenidos
1. [Vision General](#vision-general)
2. [Features por Categoria](#features-por-categoria)
3. [Targets por Modelo](#targets-por-modelo)
4. [Feature Engineering Pipeline](#feature-engineering-pipeline)
5. [Validacion y Testing](#validacion-y-testing)
---
## Vision General
### Resumen de Features
| Categoria | Cantidad | Modelos que las usan |
|-----------|----------|---------------------|
| Price Action | 12 | AMD, Range, TPSL |
| Volume | 10 | AMD, Range, TPSL |
| Volatility | 8 | AMD, Range, TPSL |
| Trend | 10 | AMD, Range, TPSL |
| Market Structure | 12 | AMD, SMC |
| Order Flow | 10 | AMD, Liquidity |
| Liquidity | 8 | Liquidity, TPSL |
| ICT | 15 | ICT Context, Range |
| SMC | 12 | AMD, TPSL |
| Time | 6 | Todos |
| **TOTAL** | **103** | - |
### Resumen de Targets
| Modelo | Tipo Target | Clases/Valores | Horizonte |
|--------|-------------|----------------|-----------|
| AMDDetector | Multiclass | 4 (neutral, acc, manip, dist) | 20 bars |
| RangePredictor | Regression | delta_high, delta_low | 15m, 1h |
| TPSLClassifier | Binary | 0/1 (SL first / TP first) | Variable |
| LiquidityHunter | Binary | 0/1 (no sweep / sweep) | 10 bars |
| ICTContextModel | Continuous | 0-1 score | Current |
---
## Features por Categoria
### 1. Price Action Features (12)
| # | Feature | Calculo | Tipo | Modelo(s) |
|---|---------|---------|------|-----------|
| 1 | `range_ratio` | `(high - low) / SMA(high - low, 20)` | float | AMD, Range |
| 2 | `range_pct` | `(high - low) / close` | float | AMD, Range |
| 3 | `body_size` | `abs(close - open) / (high - low + 1e-8)` | float | AMD |
| 4 | `upper_wick` | `(high - max(close, open)) / (high - low + 1e-8)` | float | AMD |
| 5 | `lower_wick` | `(min(close, open) - low) / (high - low + 1e-8)` | float | AMD |
| 6 | `buying_pressure` | `(close - low) / (high - low + 1e-8)` | float | AMD, Range |
| 7 | `selling_pressure` | `(high - close) / (high - low + 1e-8)` | float | AMD, Range |
| 8 | `close_position` | `(close - low) / (high - low + 1e-8)` | float | AMD |
| 9 | `range_expansion` | `range_ratio > 1.3` | binary | AMD |
| 10 | `range_compression` | `range_ratio < 0.7` | binary | AMD |
| 11 | `gap_up` | `open > high.shift(1)` | binary | Range |
| 12 | `gap_down` | `open < low.shift(1)` | binary | Range |
```python
def extract_price_action_features(df):
"""Extrae features de price action"""
f = {}
hl_range = df['high'] - df['low']
hl_range_safe = hl_range.replace(0, 1e-8)
f['range_ratio'] = hl_range / hl_range.rolling(20).mean()
f['range_pct'] = hl_range / df['close']
f['body_size'] = abs(df['close'] - df['open']) / hl_range_safe
f['upper_wick'] = (df['high'] - df[['close', 'open']].max(axis=1)) / hl_range_safe
f['lower_wick'] = (df[['close', 'open']].min(axis=1) - df['low']) / hl_range_safe
f['buying_pressure'] = (df['close'] - df['low']) / hl_range_safe
f['selling_pressure'] = (df['high'] - df['close']) / hl_range_safe
f['close_position'] = f['buying_pressure']
f['range_expansion'] = (f['range_ratio'] > 1.3).astype(int)
f['range_compression'] = (f['range_ratio'] < 0.7).astype(int)
f['gap_up'] = (df['open'] > df['high'].shift(1)).astype(int)
f['gap_down'] = (df['open'] < df['low'].shift(1)).astype(int)
return pd.DataFrame(f)
```
### 2. Volume Features (10)
| # | Feature | Calculo | Tipo | Modelo(s) |
|---|---------|---------|------|-----------|
| 1 | `volume_ratio` | `volume / SMA(volume, 20)` | float | AMD, Range |
| 2 | `volume_trend` | `SMA(volume, 10) - SMA(volume, 30)` | float | AMD |
| 3 | `volume_spike` | `volume > SMA(volume, 20) * 2` | binary | AMD |
| 4 | `obv` | On Balance Volume | float | AMD |
| 5 | `obv_slope` | `(OBV - OBV.shift(5)) / 5` | float | AMD |
| 6 | `vwap` | Volume Weighted Average Price | float | Range |
| 7 | `vwap_distance` | `(close - vwap) / vwap` | float | Range |
| 8 | `volume_on_up` | `sum(vol if close > open) / total_vol` (20 bars) | float | AMD |
| 9 | `volume_on_down` | `sum(vol if close < open) / total_vol` (20 bars) | float | AMD |
| 10 | `volume_imbalance` | `volume_on_up - volume_on_down` | float | AMD |
```python
def extract_volume_features(df):
"""Extrae features de volumen"""
f = {}
f['volume_ratio'] = df['volume'] / df['volume'].rolling(20).mean()
f['volume_trend'] = df['volume'].rolling(10).mean() - df['volume'].rolling(30).mean()
f['volume_spike'] = (df['volume'] > df['volume'].rolling(20).mean() * 2).astype(int)
# OBV
obv_direction = ((df['close'] > df['close'].shift(1)).astype(int) * 2 - 1)
f['obv'] = (df['volume'] * obv_direction).cumsum()
f['obv_slope'] = f['obv'].diff(5) / 5
# VWAP
typical_price = (df['high'] + df['low'] + df['close']) / 3
f['vwap'] = (typical_price * df['volume']).cumsum() / df['volume'].cumsum()
f['vwap_distance'] = (df['close'] - f['vwap']) / f['vwap']
# Volume distribution
up_bars = (df['close'] > df['open']).astype(int)
down_bars = (df['close'] < df['open']).astype(int)
f['volume_on_up'] = (df['volume'] * up_bars).rolling(20).sum() / df['volume'].rolling(20).sum()
f['volume_on_down'] = (df['volume'] * down_bars).rolling(20).sum() / df['volume'].rolling(20).sum()
f['volume_imbalance'] = f['volume_on_up'] - f['volume_on_down']
return pd.DataFrame(f)
```
### 3. Volatility Features (8)
| # | Feature | Calculo | Tipo | Modelo(s) |
|---|---------|---------|------|-----------|
| 1 | `atr` | Average True Range (14) | float | Todos |
| 2 | `atr_ratio` | `ATR / SMA(ATR, 50)` | float | AMD, Range |
| 3 | `atr_percentile` | Percentile de ATR (100 bars) | float | Range |
| 4 | `volatility_10` | `std(returns, 10)` | float | Range |
| 5 | `volatility_20` | `std(returns, 20)` | float | Range |
| 6 | `volatility_50` | `std(returns, 50)` | float | Range |
| 7 | `volatility_ratio` | `volatility_10 / volatility_50` | float | AMD |
| 8 | `bollinger_width` | `(BB_upper - BB_lower) / BB_middle` | float | Range |
```python
def extract_volatility_features(df):
"""Extrae features de volatilidad"""
f = {}
# ATR
tr = pd.concat([
df['high'] - df['low'],
abs(df['high'] - df['close'].shift(1)),
abs(df['low'] - df['close'].shift(1))
], axis=1).max(axis=1)
f['atr'] = tr.rolling(14).mean()
f['atr_ratio'] = f['atr'] / f['atr'].rolling(50).mean()
f['atr_percentile'] = f['atr'].rolling(100).apply(
lambda x: pd.Series(x).rank(pct=True).iloc[-1]
)
# Returns volatility
returns = df['close'].pct_change()
f['volatility_10'] = returns.rolling(10).std()
f['volatility_20'] = returns.rolling(20).std()
f['volatility_50'] = returns.rolling(50).std()
f['volatility_ratio'] = f['volatility_10'] / f['volatility_50']
# Bollinger Width
sma_20 = df['close'].rolling(20).mean()
std_20 = df['close'].rolling(20).std()
bb_upper = sma_20 + 2 * std_20
bb_lower = sma_20 - 2 * std_20
f['bollinger_width'] = (bb_upper - bb_lower) / sma_20
return pd.DataFrame(f)
```
### 4. Trend Features (10)
| # | Feature | Calculo | Tipo | Modelo(s) |
|---|---------|---------|------|-----------|
| 1 | `sma_10` | Simple Moving Average (10) | float | Range |
| 2 | `sma_20` | Simple Moving Average (20) | float | Range |
| 3 | `sma_50` | Simple Moving Average (50) | float | Range |
| 4 | `close_sma_10_ratio` | `close / SMA_10` | float | AMD |
| 5 | `close_sma_20_ratio` | `close / SMA_20` | float | AMD |
| 6 | `sma_slope_20` | `(SMA_20 - SMA_20.shift(5)) / 5` | float | AMD |
| 7 | `trend_strength` | `abs(sma_slope_20) / ATR` | float | AMD |
| 8 | `adx` | Average Directional Index (14) | float | AMD |
| 9 | `plus_di` | +DI (14) | float | Range |
| 10 | `minus_di` | -DI (14) | float | Range |
```python
def extract_trend_features(df):
"""Extrae features de tendencia"""
f = {}
# SMAs
f['sma_10'] = df['close'].rolling(10).mean()
f['sma_20'] = df['close'].rolling(20).mean()
f['sma_50'] = df['close'].rolling(50).mean()
f['close_sma_10_ratio'] = df['close'] / f['sma_10']
f['close_sma_20_ratio'] = df['close'] / f['sma_20']
f['sma_slope_20'] = f['sma_20'].diff(5) / 5
# Trend strength
atr = calculate_atr(df, 14)
f['trend_strength'] = abs(f['sma_slope_20']) / atr
# ADX calculation
f['adx'], f['plus_di'], f['minus_di'] = calculate_adx(df, 14)
return pd.DataFrame(f)
def calculate_adx(df, period=14):
"""Calcula ADX, +DI, -DI"""
plus_dm = df['high'].diff()
minus_dm = -df['low'].diff()
plus_dm = plus_dm.where((plus_dm > minus_dm) & (plus_dm > 0), 0)
minus_dm = minus_dm.where((minus_dm > plus_dm) & (minus_dm > 0), 0)
tr = pd.concat([
df['high'] - df['low'],
abs(df['high'] - df['close'].shift(1)),
abs(df['low'] - df['close'].shift(1))
], axis=1).max(axis=1)
atr = tr.rolling(period).mean()
plus_di = 100 * (plus_dm.rolling(period).mean() / atr)
minus_di = 100 * (minus_dm.rolling(period).mean() / atr)
dx = 100 * abs(plus_di - minus_di) / (plus_di + minus_di)
adx = dx.rolling(period).mean()
return adx, plus_di, minus_di
```
### 5. Market Structure Features (12)
| # | Feature | Calculo | Tipo | Modelo(s) |
|---|---------|---------|------|-----------|
| 1 | `higher_highs_count` | Count HH en 20 bars | int | AMD |
| 2 | `higher_lows_count` | Count HL en 20 bars | int | AMD |
| 3 | `lower_highs_count` | Count LH en 20 bars | int | AMD |
| 4 | `lower_lows_count` | Count LL en 20 bars | int | AMD |
| 5 | `swing_high_distance` | Distancia al swing high mas cercano | float | Liquidity |
| 6 | `swing_low_distance` | Distancia al swing low mas cercano | float | Liquidity |
| 7 | `bos_bullish_count` | Count BOS alcista en 30 bars | int | SMC |
| 8 | `bos_bearish_count` | Count BOS bajista en 30 bars | int | SMC |
| 9 | `choch_bullish_count` | Count CHOCH alcista en 30 bars | int | SMC |
| 10 | `choch_bearish_count` | Count CHOCH bajista en 30 bars | int | SMC |
| 11 | `structure_score` | Score de estructura (-1 a +1) | float | AMD |
| 12 | `structure_alignment` | Alineacion con tendencia | binary | Range |
```python
def extract_market_structure_features(df, lookback=20):
"""Extrae features de estructura de mercado"""
f = {}
# Higher highs/lows, Lower highs/lows
f['higher_highs_count'] = (df['high'] > df['high'].shift(1)).rolling(lookback).sum()
f['higher_lows_count'] = (df['low'] > df['low'].shift(1)).rolling(lookback).sum()
f['lower_highs_count'] = (df['high'] < df['high'].shift(1)).rolling(lookback).sum()
f['lower_lows_count'] = (df['low'] < df['low'].shift(1)).rolling(lookback).sum()
# Swing distances
swing_highs = detect_swing_points(df, 'high', lookback)
swing_lows = detect_swing_points(df, 'low', lookback)
f['swing_high_distance'] = calculate_distance_to_nearest(df['close'], swing_highs, 'above')
f['swing_low_distance'] = calculate_distance_to_nearest(df['close'], swing_lows, 'below')
# BOS and CHOCH counts
bos_signals = detect_bos(df, lookback)
choch_signals = detect_choch(df, lookback)
f['bos_bullish_count'] = count_signals(bos_signals, 'bullish', 30)
f['bos_bearish_count'] = count_signals(bos_signals, 'bearish', 30)
f['choch_bullish_count'] = count_signals(choch_signals, 'bullish', 30)
f['choch_bearish_count'] = count_signals(choch_signals, 'bearish', 30)
# Structure score
bullish_points = f['higher_highs_count'] + f['higher_lows_count']
bearish_points = f['lower_highs_count'] + f['lower_lows_count']
total_points = bullish_points + bearish_points + 1e-8
f['structure_score'] = (bullish_points - bearish_points) / total_points
# Structure alignment
trend_direction = np.sign(df['close'].rolling(20).mean().diff(5))
f['structure_alignment'] = (np.sign(f['structure_score']) == trend_direction).astype(int)
return pd.DataFrame(f)
```
### 6. Order Flow Features (10)
| # | Feature | Calculo | Tipo | Modelo(s) |
|---|---------|---------|------|-----------|
| 1 | `order_blocks_bullish` | Count OB bullish en 30 bars | int | AMD |
| 2 | `order_blocks_bearish` | Count OB bearish en 30 bars | int | AMD |
| 3 | `ob_net` | `OB_bullish - OB_bearish` | int | AMD |
| 4 | `fvg_bullish_count` | Count FVG bullish sin rellenar | int | Range |
| 5 | `fvg_bearish_count` | Count FVG bearish sin rellenar | int | Range |
| 6 | `fvg_nearest_distance` | Distancia a FVG mas cercano | float | Range |
| 7 | `false_breakout_count` | Count falsas rupturas en 30 bars | int | AMD |
| 8 | `whipsaw_intensity` | Frecuencia de reversiones rapidas | float | AMD |
| 9 | `reversal_count` | Count reversiones en 20 bars | int | AMD |
| 10 | `displacement_strength` | Fuerza del ultimo displacement | float | SMC |
```python
def extract_order_flow_features(df, lookback=30):
"""Extrae features de order flow"""
f = {}
# Order blocks
ob_bullish = identify_order_blocks(df, 'bullish')
ob_bearish = identify_order_blocks(df, 'bearish')
f['order_blocks_bullish'] = count_recent(ob_bullish, lookback)
f['order_blocks_bearish'] = count_recent(ob_bearish, lookback)
f['ob_net'] = f['order_blocks_bullish'] - f['order_blocks_bearish']
# Fair Value Gaps
fvg_bullish = identify_fvg(df, 'bullish')
fvg_bearish = identify_fvg(df, 'bearish')
f['fvg_bullish_count'] = count_unfilled_fvg(fvg_bullish, df['close'])
f['fvg_bearish_count'] = count_unfilled_fvg(fvg_bearish, df['close'])
f['fvg_nearest_distance'] = calculate_nearest_fvg_distance(df, fvg_bullish + fvg_bearish)
# False breakouts and whipsaws
f['false_breakout_count'] = count_false_breakouts(df, lookback)
f['whipsaw_intensity'] = calculate_whipsaw_intensity(df, lookback)
# Reversals
price_changes = df['close'].pct_change()
reversals = ((price_changes > 0.005) & (price_changes.shift(-1) < -0.005)) | \
((price_changes < -0.005) & (price_changes.shift(-1) > 0.005))
f['reversal_count'] = reversals.rolling(20).sum()
# Displacement
f['displacement_strength'] = calculate_displacement_strength(df)
return pd.DataFrame(f)
```
### 7. Liquidity Features (8)
| # | Feature | Calculo | Tipo | Modelo(s) |
|---|---------|---------|------|-----------|
| 1 | `bsl_distance` | Distancia a Buy Side Liquidity | float | Liquidity |
| 2 | `ssl_distance` | Distancia a Sell Side Liquidity | float | Liquidity |
| 3 | `bsl_strength` | Numero de stops acumulados arriba | int | Liquidity |
| 4 | `ssl_strength` | Numero de stops acumulados abajo | int | Liquidity |
| 5 | `liquidity_grab_count` | Count grabs recientes (20 bars) | int | AMD |
| 6 | `time_since_bsl_sweep` | Bars desde ultimo BSL sweep | int | Liquidity |
| 7 | `time_since_ssl_sweep` | Bars desde ultimo SSL sweep | int | Liquidity |
| 8 | `liquidity_imbalance` | `(BSL_strength - SSL_strength) / total` | float | Liquidity |
```python
def extract_liquidity_features(df, lookback=20):
"""Extrae features de liquidez"""
f = {}
# Identify liquidity pools
swing_highs = df['high'].rolling(lookback, center=True).max()
swing_lows = df['low'].rolling(lookback, center=True).min()
# Distances to liquidity
f['bsl_distance'] = (swing_highs - df['close']) / df['close']
f['ssl_distance'] = (df['close'] - swing_lows) / df['close']
# Liquidity strength (number of swing points)
f['bsl_strength'] = count_swing_points_above(df, lookback)
f['ssl_strength'] = count_swing_points_below(df, lookback)
# Liquidity grabs
f['liquidity_grab_count'] = count_liquidity_grabs(df, lookback)
# Time since sweeps
bsl_sweeps = detect_bsl_sweeps(df)
ssl_sweeps = detect_ssl_sweeps(df)
f['time_since_bsl_sweep'] = bars_since_last(bsl_sweeps)
f['time_since_ssl_sweep'] = bars_since_last(ssl_sweeps)
# Liquidity imbalance
total_liquidity = f['bsl_strength'] + f['ssl_strength'] + 1e-8
f['liquidity_imbalance'] = (f['bsl_strength'] - f['ssl_strength']) / total_liquidity
return pd.DataFrame(f)
```
### 8. ICT Features (15)
| # | Feature | Calculo | Tipo | Modelo(s) |
|---|---------|---------|------|-----------|
| 1 | `ote_position` | Posicion en Fibonacci (0-1) | float | ICT |
| 2 | `in_discount_zone` | Precio en 21-38% Fib | binary | ICT |
| 3 | `in_premium_zone` | Precio en 62-79% Fib | binary | ICT |
| 4 | `in_ote_buy_zone` | Zona optima compra (62-79%) | binary | ICT |
| 5 | `in_ote_sell_zone` | Zona optima venta (21-38%) | binary | ICT |
| 6 | `is_london_kz` | En London Open Killzone | binary | ICT |
| 7 | `is_ny_kz` | En NY AM Killzone | binary | ICT |
| 8 | `is_asian_kz` | En Asian Killzone | binary | ICT |
| 9 | `killzone_strength` | Fuerza de la sesion (0-1) | float | ICT |
| 10 | `session_overlap` | En overlap London/NY | binary | ICT |
| 11 | `weekly_range_position` | Posicion en rango semanal | float | ICT |
| 12 | `daily_range_position` | Posicion en rango diario | float | ICT |
| 13 | `mmsm_detected` | Market Maker Sell Model | binary | ICT |
| 14 | `mmbm_detected` | Market Maker Buy Model | binary | ICT |
| 15 | `po3_phase` | Power of 3 phase (1-3) | int | ICT |
```python
def extract_ict_features(df, timestamps):
"""Extrae features ICT"""
f = {}
# OTE zones
swing_high = df['high'].rolling(50).max()
swing_low = df['low'].rolling(50).min()
range_size = swing_high - swing_low
f['ote_position'] = (df['close'] - swing_low) / range_size
f['in_discount_zone'] = ((f['ote_position'] >= 0.21) & (f['ote_position'] <= 0.38)).astype(int)
f['in_premium_zone'] = ((f['ote_position'] >= 0.62) & (f['ote_position'] <= 0.79)).astype(int)
f['in_ote_buy_zone'] = f['in_discount_zone']
f['in_ote_sell_zone'] = f['in_premium_zone']
# Killzones
killzones = identify_killzones(timestamps)
f['is_london_kz'] = (killzones == 'london_open').astype(int)
f['is_ny_kz'] = (killzones == 'ny_am').astype(int)
f['is_asian_kz'] = (killzones == 'asian').astype(int)
f['killzone_strength'] = get_killzone_strength(killzones)
f['session_overlap'] = ((killzones == 'london_close') | (killzones == 'ny_am')).astype(int)
# Range positions
f['weekly_range_position'] = calculate_weekly_position(df)
f['daily_range_position'] = calculate_daily_position(df)
# Market Maker Models
f['mmsm_detected'] = detect_mmsm(df)
f['mmbm_detected'] = detect_mmbm(df)
# Power of 3
f['po3_phase'] = calculate_po3_phase(df, timestamps)
return pd.DataFrame(f)
```
### 9. SMC Features (12)
| # | Feature | Calculo | Tipo | Modelo(s) |
|---|---------|---------|------|-----------|
| 1 | `choch_bullish_recent` | CHOCH bullish en 30 bars | binary | SMC |
| 2 | `choch_bearish_recent` | CHOCH bearish en 30 bars | binary | SMC |
| 3 | `bos_bullish_recent` | BOS bullish en 30 bars | binary | SMC |
| 4 | `bos_bearish_recent` | BOS bearish en 30 bars | binary | SMC |
| 5 | `inducement_bullish` | Inducement bullish detectado | binary | SMC |
| 6 | `inducement_bearish` | Inducement bearish detectado | binary | SMC |
| 7 | `displacement_bullish` | Displacement bullish reciente | binary | SMC |
| 8 | `displacement_bearish` | Displacement bearish reciente | binary | SMC |
| 9 | `liquidity_void_distance` | Distancia a void mas cercano | float | SMC |
| 10 | `structure_bullish_score` | Score estructura alcista | float | SMC |
| 11 | `structure_bearish_score` | Score estructura bajista | float | SMC |
| 12 | `smc_confluence_score` | Score de confluence SMC | float | SMC |
```python
def extract_smc_features(df, lookback=30):
"""Extrae features SMC"""
f = {}
# CHOCH
choch_signals = detect_choch(df, lookback)
f['choch_bullish_recent'] = has_recent_signal(choch_signals, 'bullish', 30)
f['choch_bearish_recent'] = has_recent_signal(choch_signals, 'bearish', 30)
# BOS
bos_signals = detect_bos(df, lookback)
f['bos_bullish_recent'] = has_recent_signal(bos_signals, 'bullish', 30)
f['bos_bearish_recent'] = has_recent_signal(bos_signals, 'bearish', 30)
# Inducement
inducements = detect_inducement(df)
f['inducement_bullish'] = has_recent_signal(inducements, 'bullish', 20)
f['inducement_bearish'] = has_recent_signal(inducements, 'bearish', 20)
# Displacement
displacements = detect_displacement(df)
f['displacement_bullish'] = has_recent_signal(displacements, 'bullish', 10)
f['displacement_bearish'] = has_recent_signal(displacements, 'bearish', 10)
# Liquidity voids
voids = detect_liquidity_voids(df)
f['liquidity_void_distance'] = calculate_nearest_void_distance(df['close'], voids)
# Structure scores
f['structure_bullish_score'] = calculate_bullish_structure_score(df)
f['structure_bearish_score'] = calculate_bearish_structure_score(df)
# SMC Confluence
f['smc_confluence_score'] = calculate_smc_confluence(f)
return pd.DataFrame(f)
```
### 10. Time Features (6)
| # | Feature | Calculo | Tipo | Modelo(s) |
|---|---------|---------|------|-----------|
| 1 | `hour_sin` | `sin(2 * pi * hour / 24)` | float | Todos |
| 2 | `hour_cos` | `cos(2 * pi * hour / 24)` | float | Todos |
| 3 | `day_of_week` | Dia de la semana (0-6) | int | Todos |
| 4 | `is_weekend` | Sabado o Domingo | binary | Todos |
| 5 | `time_in_session` | Minutos desde inicio sesion | int | ICT |
| 6 | `minutes_to_close` | Minutos hasta cierre sesion | int | ICT |
```python
def extract_time_features(timestamps):
"""Extrae features temporales"""
f = {}
hours = timestamps.hour
f['hour_sin'] = np.sin(2 * np.pi * hours / 24)
f['hour_cos'] = np.cos(2 * np.pi * hours / 24)
f['day_of_week'] = timestamps.dayofweek
f['is_weekend'] = (timestamps.dayofweek >= 5).astype(int)
# Session timing
f['time_in_session'] = calculate_time_in_session(timestamps)
f['minutes_to_close'] = calculate_minutes_to_close(timestamps)
return pd.DataFrame(f)
```
---
## Targets por Modelo
### 1. AMDDetector Target
**Tipo:** Multiclass Classification (4 clases)
| Clase | Valor | Descripcion |
|-------|-------|-------------|
| Neutral | 0 | Sin fase clara definida |
| Accumulation | 1 | Fase de acumulacion |
| Manipulation | 2 | Fase de manipulacion |
| Distribution | 3 | Fase de distribucion |
**Metodo de Labeling:**
```python
def label_amd_phase(df, i, forward_window=20):
"""
Etiqueta la fase AMD basada en comportamiento futuro
Criterios:
- Accumulation: Rango estrecho + precio sube despues
- Manipulation: Falsas rupturas + whipsaws
- Distribution: Volumen en caidas + precio baja despues
- Neutral: No cumple ninguno claramente
"""
if i + forward_window >= len(df):
return 0 # neutral
future = df.iloc[i:i+forward_window]
current_price = df['close'].iloc[i]
# Metricas del futuro
price_range_pct = (future['high'].max() - future['low'].min()) / current_price
final_price = future['close'].iloc[-1]
price_change = (final_price - current_price) / current_price
# Volumen
volume_first_half = future['volume'].iloc[:10].mean()
volume_second_half = future['volume'].iloc[10:].mean()
# False breakouts
false_breaks = count_false_breakouts_forward(df, i, forward_window)
# ACCUMULATION criteria
if price_range_pct < 0.02: # Rango < 2%
if price_change > 0.01: # Sube > 1% despues
if volume_second_half < volume_first_half: # Volumen decreciente
return 1 # accumulation
# MANIPULATION criteria
if false_breaks >= 2: # 2+ falsas rupturas
whipsaw_count = count_whipsaws_forward(df, i, forward_window)
if whipsaw_count >= 3:
return 2 # manipulation
# DISTRIBUTION criteria
if price_change < -0.015: # Cae > 1.5%
# Volumen alto en caidas
down_volume = calculate_volume_on_down_moves(future)
if down_volume > 0.6: # 60%+ volumen en caidas
return 3 # distribution
return 0 # neutral
def count_false_breakouts_forward(df, i, window):
"""Cuenta falsas rupturas en ventana futura"""
future = df.iloc[i:i+window]
resistance = df['high'].iloc[max(0,i-20):i].max()
support = df['low'].iloc[max(0,i-20):i].min()
false_breaks = 0
for j in range(1, len(future)):
# False breakout above
if future['high'].iloc[j] > resistance * 1.005:
if future['close'].iloc[j] < resistance:
false_breaks += 1
# False breakdown below
if future['low'].iloc[j] < support * 0.995:
if future['close'].iloc[j] > support:
false_breaks += 1
return false_breaks
```
**Balance de Clases Esperado:**
- Neutral: ~40%
- Accumulation: ~20%
- Manipulation: ~20%
- Distribution: ~20%
### 2. RangePredictor Target
**Tipo:** Regression (continuo) + Binned Classification
**Targets de Regresion:**
| Target | Calculo | Horizonte |
|--------|---------|-----------|
| `delta_high_15m` | `(max_high_3bars - close) / close` | 15 min |
| `delta_low_15m` | `(close - min_low_3bars) / close` | 15 min |
| `delta_high_1h` | `(max_high_12bars - close) / close` | 1 hora |
| `delta_low_1h` | `(close - min_low_12bars) / close` | 1 hora |
**Targets Binned:**
| Bin | Rango (ATR multiple) | Descripcion |
|-----|---------------------|-------------|
| 0 | < 0.3 ATR | Muy bajo |
| 1 | 0.3 - 0.7 ATR | Bajo |
| 2 | 0.7 - 1.2 ATR | Medio |
| 3 | > 1.2 ATR | Alto |
```python
def calculate_range_targets(df, horizons={'15m': 3, '1h': 12}):
"""
Calcula targets para RangePredictor
"""
targets = {}
atr = calculate_atr(df, 14)
for name, periods in horizons.items():
# Regression targets
future_high = df['high'].rolling(periods).max().shift(-periods)
future_low = df['low'].rolling(periods).min().shift(-periods)
targets[f'delta_high_{name}'] = (future_high - df['close']) / df['close']
targets[f'delta_low_{name}'] = (df['close'] - future_low) / df['close']
# Binned targets
for target_type in ['high', 'low']:
delta = targets[f'delta_{target_type}_{name}']
atr_ratio = delta / (atr / df['close'])
bins = pd.cut(
atr_ratio,
bins=[-np.inf, 0.3, 0.7, 1.2, np.inf],
labels=[0, 1, 2, 3]
)
targets[f'bin_{target_type}_{name}'] = bins
return pd.DataFrame(targets)
```
### 3. TPSLClassifier Target
**Tipo:** Binary Classification
| Valor | Descripcion |
|-------|-------------|
| 0 | Stop Loss toca primero |
| 1 | Take Profit toca primero |
**Configuraciones R:R:**
| Config | SL Distance | TP Distance | R:R |
|--------|-------------|-------------|-----|
| `rr_2_1` | 0.3 ATR | 0.6 ATR | 2:1 |
| `rr_3_1` | 0.3 ATR | 0.9 ATR | 3:1 |
| `rr_4_1` | 0.25 ATR | 1.0 ATR | 4:1 |
```python
def calculate_tpsl_targets(df, horizons, rr_configs):
"""
Calcula targets para TPSLClassifier
Returns 1 si TP toca primero, 0 si SL toca primero, NaN si ninguno
"""
targets = {}
atr = calculate_atr(df, 14)
for horizon_name, max_bars in horizons.items():
for rr in rr_configs:
target_name = f'tp_first_{horizon_name}_{rr["name"]}'
sl_distance = atr * rr['sl_atr_multiple']
tp_distance = atr * rr['tp_atr_multiple']
results = []
for i in range(len(df)):
if i + max_bars >= len(df):
results.append(np.nan)
continue
entry_price = df['close'].iloc[i]
sl_price = entry_price - sl_distance.iloc[i]
tp_price = entry_price + tp_distance.iloc[i]
# Simular hacia adelante
result = simulate_trade_outcome(
df.iloc[i+1:i+max_bars+1],
entry_price,
sl_price,
tp_price
)
results.append(result)
targets[target_name] = results
return pd.DataFrame(targets)
def simulate_trade_outcome(future_bars, entry, sl, tp):
"""
Simula resultado del trade
Returns: 1 (TP first), 0 (SL first), NaN (neither)
"""
for _, row in future_bars.iterrows():
# Check SL first (assuming worst case)
if row['low'] <= sl:
return 0
# Check TP
if row['high'] >= tp:
return 1
return np.nan # Neither hit within window
```
### 4. LiquidityHunter Target
**Tipo:** Binary Classification
| Valor | Descripcion |
|-------|-------------|
| 0 | No hay liquidity sweep |
| 1 | Hay liquidity sweep |
**Tipos de Sweep:**
| Target | Descripcion |
|--------|-------------|
| `bsl_sweep` | Sweep de Buy Side Liquidity |
| `ssl_sweep` | Sweep de Sell Side Liquidity |
| `any_sweep` | Cualquier sweep |
```python
def calculate_liquidity_targets(df, forward_window=10, sweep_threshold=0.005):
"""
Calcula targets para LiquidityHunter
"""
targets = {}
for i in range(len(df) - forward_window):
# Current liquidity levels
lookback = df.iloc[max(0, i-20):i]
swing_high = lookback['high'].max()
swing_low = lookback['low'].min()
# Future price action
future = df.iloc[i:i+forward_window]
# BSL sweep (price goes above swing high then reverses)
bsl_level = swing_high * (1 + sweep_threshold)
bsl_swept = (future['high'] >= bsl_level).any()
bsl_reversed = bsl_swept and (future['close'].iloc[-1] < swing_high)
# SSL sweep (price goes below swing low then reverses)
ssl_level = swing_low * (1 - sweep_threshold)
ssl_swept = (future['low'] <= ssl_level).any()
ssl_reversed = ssl_swept and (future['close'].iloc[-1] > swing_low)
targets.setdefault('bsl_sweep', []).append(1 if bsl_reversed else 0)
targets.setdefault('ssl_sweep', []).append(1 if ssl_reversed else 0)
targets.setdefault('any_sweep', []).append(1 if (bsl_reversed or ssl_reversed) else 0)
# Padding para ultimos indices
for key in targets:
targets[key].extend([np.nan] * forward_window)
return pd.DataFrame(targets)
```
### 5. ICTContextModel Target
**Tipo:** Continuous Score (0-1)
Este modelo no tiene un target tradicional sino que calcula un score en tiempo real basado en contexto ICT.
```python
def calculate_ict_context_score(df, timestamps):
"""
Calcula score de contexto ICT (0-1)
Factores:
- Killzone strength (40%)
- OTE position alignment (30%)
- Range position (20%)
- MM model detection (10%)
"""
score = 0.0
# Killzone
killzone = identify_killzone(timestamps.iloc[-1])
kz_strength = get_killzone_strength(killzone)
score += 0.40 * kz_strength
# OTE alignment
ote_pos = calculate_ote_position(df)
if ote_pos < 0.38: # Discount
ote_alignment = 0.38 - ote_pos # Better if lower
elif ote_pos > 0.62: # Premium
ote_alignment = ote_pos - 0.62 # Better if higher
else:
ote_alignment = 0 # Near equilibrium
score += 0.30 * min(ote_alignment * 3, 1.0)
# Range position
daily_pos = calculate_daily_range_position(df)
range_score = abs(daily_pos - 0.5) * 2 # Better at extremes
score += 0.20 * range_score
# MM model
mm_model = detect_market_maker_model(df)
if mm_model['model'] != 'none':
score += 0.10 * mm_model['confidence']
return score
```
---
## Feature Engineering Pipeline
### Pipeline Completo
```python
class FeatureEngineeringPipeline:
"""Pipeline completo de feature engineering"""
def __init__(self, config=None):
self.config = config or self._default_config()
self.scaler = RobustScaler()
self.feature_names = []
def fit_transform(self, df, timestamps=None):
"""Extrae y normaliza todas las features"""
# 1. Extract all feature groups
price_features = extract_price_action_features(df)
volume_features = extract_volume_features(df)
volatility_features = extract_volatility_features(df)
trend_features = extract_trend_features(df)
structure_features = extract_market_structure_features(df)
order_flow_features = extract_order_flow_features(df)
liquidity_features = extract_liquidity_features(df)
if timestamps is not None:
ict_features = extract_ict_features(df, timestamps)
time_features = extract_time_features(timestamps)
else:
ict_features = pd.DataFrame()
time_features = pd.DataFrame()
smc_features = extract_smc_features(df)
# 2. Combine all features
all_features = pd.concat([
price_features,
volume_features,
volatility_features,
trend_features,
structure_features,
order_flow_features,
liquidity_features,
ict_features,
smc_features,
time_features
], axis=1)
# 3. Handle NaN
all_features = all_features.fillna(method='ffill').fillna(0)
# 4. Store feature names
self.feature_names = all_features.columns.tolist()
# 5. Scale features
scaled_features = self.scaler.fit_transform(all_features)
return scaled_features
def transform(self, df, timestamps=None):
"""Transform solo (usa scaler ya ajustado)"""
# ... same extraction ...
return self.scaler.transform(all_features)
def get_feature_importance(self, model, top_n=20):
"""Obtiene importancia de features"""
importance = pd.DataFrame({
'feature': self.feature_names,
'importance': model.feature_importances_
}).sort_values('importance', ascending=False)
return importance.head(top_n)
```
---
## Validacion y Testing
### Metricas por Modelo
| Modelo | Metrica Principal | Target | Metricas Secundarias |
|--------|-------------------|--------|---------------------|
| AMDDetector | Accuracy | >70% | F1 macro >0.65, Precision por clase >60% |
| RangePredictor | MAE | <0.003 | R2 >0.3, Directional Acc >90% |
| TPSLClassifier | AUC | >0.85 | Accuracy >80%, Precision >75% |
| LiquidityHunter | Precision | >70% | Recall >60%, F1 >0.65 |
| ICTContextModel | - | - | Validacion cualitativa |
### Validacion Temporal
```python
def temporal_validation(model, X, y, n_splits=5):
"""
Validacion respetando orden temporal
"""
tscv = TimeSeriesSplit(n_splits=n_splits)
scores = []
for fold, (train_idx, val_idx) in enumerate(tscv.split(X)):
X_train, X_val = X[train_idx], X[val_idx]
y_train, y_val = y[train_idx], y[val_idx]
model.fit(X_train, y_train)
y_pred = model.predict(X_val)
score = calculate_metrics(y_val, y_pred)
scores.append(score)
return np.mean(scores), np.std(scores)
```
---
**Documento Generado:** 2025-12-08
**Trading Strategist - Trading Platform**