--- id: "MODELOS-ML-DEFINICION" title: "Arquitectura de Modelos ML - Trading Platform" type: "Documentation" project: "trading-platform" version: "1.0.0" updated_date: "2026-01-04" --- # Arquitectura de Modelos ML - Trading Platform **Versi\u00f3n:** 1.0.0 **Fecha:** 2025-12-05 **M\u00f3dulo:** OQI-006-ml-signals **Autor:** Trading Strategist - Trading Platform --- ## Tabla de Contenidos 1. [Visi\u00f3n General](#visi\u00f3n-general) 2. [Modelo 1: AMDDetector](#modelo-1-amddetector) 3. [Modelo 2: RangePredictor](#modelo-2-rangepredictor) 4. [Modelo 3: TPSLClassifier](#modelo-3-tpslclassifier) 5. [Modelo 4: LiquidityHunter](#modelo-4-liquidityhunter) 6. [Modelo 5: OrderFlowAnalyzer](#modelo-5-orderflowanalyzer) 7. [Meta-Modelo: StrategyOrchestrator](#meta-modelo-strategyorchestrator) 8. [Pipeline de Entrenamiento](#pipeline-de-entrenamiento) 9. [M\u00e9tricas y Evaluaci\u00f3n](#m\u00e9tricas-y-evaluaci\u00f3n) 10. [Producci\u00f3n y Deployment](#producci\u00f3n-y-deployment) --- ## Visi\u00f3n General ### Arquitectura del Sistema ``` ┌─────────────────────────────────────────────────────────────────┐ │ ORBIQUANT IA ML SYSTEM │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ AMD │ │ Liquidity │ │ OrderFlow │ │ │ │ Detector │ │ Hunter │ │ Analyzer │ │ │ │ (Phase) │ │ (Hunt) │ │ (Flow) │ │ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ │ │ │ │ │ └────────────────┼────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────┐ │ │ │ Feature Union │ │ │ │ (Combined) │ │ │ └────────┬────────┘ │ │ │ │ │ ┌───────────────┴───────────────┐ │ │ │ │ │ │ ▼ ▼ │ │ ┌─────────────┐ ┌─────────────┐ │ │ │ Range │ │ TPSL │ │ │ │ Predictor │ │ Classifier │ │ │ │ (ΔH/ΔL) │ │ (P[TP]) │ │ │ └──────┬──────┘ └──────┬──────┘ │ │ │ │ │ │ └───────────────┬───────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────┐ │ │ │ Strategy │ │ │ │ Orchestrator │ │ │ │ (Meta-Model) │ │ │ └────────┬────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────┐ │ │ │ Signal Output │ │ │ │ BUY/SELL/HOLD │ │ │ └─────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ### Flujo de Datos ``` Market Data (OHLCV) │ ▼ Feature Engineering (50+ features) │ ├─────────────┬─────────────┬─────────────┐ ▼ ▼ ▼ ▼ AMDDetector Liquidity OrderFlow Base Hunter Analyzer Features │ │ │ │ └─────────────┴─────────────┴─────────────┘ │ ▼ Combined Feature Vector (100+ dims) │ ┌─────────────┴─────────────┐ ▼ ▼ RangePredictor TPSLClassifier │ │ └─────────────┬─────────────┘ ▼ StrategyOrchestrator │ ▼ Trading Signal ``` ### Principios de Dise\u00f1o 1. **Modular**: Cada modelo es independiente y reutilizable 2. **Escalable**: F\u00e1cil agregar nuevos modelos 3. **Interpretable**: Feature importance y explicabilidad 4. **Robusto**: Validaci\u00f3n temporal estricta (no look-ahead bias) 5. **Production-Ready**: API, monitoring, retraining autom\u00e1tico --- ## Modelo 1: AMDDetector ### Descripci\u00f3n Clasificador multiclass que identifica la fase actual del mercado seg\u00fan el framework AMD (Accumulation-Manipulation-Distribution). ### Arquitectura **Tipo:** XGBoost Multiclass Classifier **Output:** Probabilities para 4 clases ```python from xgboost import XGBClassifier class AMDDetector: """ Detecta fases AMD usando XGBoost """ def __init__(self, config=None): self.config = config or self._default_config() self.model = self._init_model() self.scaler = RobustScaler() self.label_encoder = { 0: 'neutral', 1: 'accumulation', 2: 'manipulation', 3: 'distribution' } def _init_model(self): return XGBClassifier( objective='multi:softprob', num_class=4, n_estimators=300, max_depth=6, learning_rate=0.05, subsample=0.8, colsample_bytree=0.8, min_child_weight=5, gamma=0.2, reg_alpha=0.1, reg_lambda=1.0, scale_pos_weight=1.0, tree_method='hist', device='cuda', # GPU support random_state=42 ) ``` ### Input Features **Dimensi\u00f3n:** 50 features | Categor\u00eda | Features | Cantidad | |-----------|----------|----------| | **Price Action** | range_ratio, body_size, wicks, etc. | 10 | | **Volume** | volume_ratio, trend, OBV, etc. | 8 | | **Volatility** | ATR, volatility_*, percentiles | 6 | | **Trend** | SMAs, slopes, strength | 8 | | **Market Structure** | higher_highs, lower_lows, BOS, CHOCH | 10 | | **Order Flow** | order_blocks, FVG, liquidity_grabs | 8 | ```python def extract_amd_features(df): """ Extrae features para AMDDetector """ features = {} # Price action features['range_ratio'] = (df['high'] - df['low']) / df['high'].rolling(20).mean() features['body_size'] = abs(df['close'] - df['open']) / (df['high'] - df['low']) features['upper_wick'] = (df['high'] - df[['close', 'open']].max(axis=1)) / (df['high'] - df['low']) features['lower_wick'] = (df[['close', 'open']].min(axis=1) - df['low']) / (df['high'] - df['low']) features['buying_pressure'] = (df['close'] - df['low']) / (df['high'] - df['low']) features['selling_pressure'] = (df['high'] - df['close']) / (df['high'] - df['low']) # Volume features['volume_ratio'] = df['volume'] / df['volume'].rolling(20).mean() features['volume_trend'] = df['volume'].rolling(10).mean() - df['volume'].rolling(30).mean() features['obv'] = (df['volume'] * ((df['close'] > df['close'].shift(1)).astype(int) * 2 - 1)).cumsum() features['obv_slope'] = features['obv'].diff(5) / 5 # Volatility features['atr'] = calculate_atr(df, 14) features['atr_ratio'] = features['atr'] / features['atr'].rolling(50).mean() features['volatility_10'] = df['close'].pct_change().rolling(10).std() features['volatility_20'] = df['close'].pct_change().rolling(20).std() # Trend features['sma_10'] = df['close'].rolling(10).mean() features['sma_20'] = df['close'].rolling(20).mean() features['sma_50'] = df['close'].rolling(50).mean() features['close_sma_ratio_20'] = df['close'] / features['sma_20'] features['trend_slope'] = features['sma_20'].diff(5) / 5 features['trend_strength'] = abs(features['trend_slope']) / features['atr'] # Market structure features['higher_highs'] = (df['high'] > df['high'].shift(1)).rolling(10).sum() features['higher_lows'] = (df['low'] > df['low'].shift(1)).rolling(10).sum() features['lower_highs'] = (df['high'] < df['high'].shift(1)).rolling(10).sum() features['lower_lows'] = (df['low'] < df['low'].shift(1)).rolling(10).sum() # Order flow features['order_blocks_bullish'] = detect_order_blocks(df, 'bullish') features['order_blocks_bearish'] = detect_order_blocks(df, 'bearish') features['fvg_count_bullish'] = detect_fvg(df, 'bullish') features['fvg_count_bearish'] = detect_fvg(df, 'bearish') return pd.DataFrame(features) ``` ### Target Labeling **M\u00e9todo:** Forward-looking con ventana de 20 periodos ```python def label_amd_phase(df, i, forward_window=20): """ Etiqueta fase AMD basada en comportamiento futuro """ if i + forward_window >= len(df): return 0 # neutral future = df.iloc[i:i+forward_window] current = df.iloc[i] # Calculate metrics price_range = (future['high'].max() - future['low'].min()) / current['close'] volume_avg = future['volume'].mean() volume_std = future['volume'].std() price_end = future['close'].iloc[-1] price_start = current['close'] # Accumulation criteria if price_range < 0.02: # Tight range (< 2%) volume_declining = future['volume'].iloc[-5:].mean() < future['volume'].iloc[:5].mean() if volume_declining and price_end > price_start: return 1 # accumulation # Manipulation criteria false_breaks = count_false_breakouts(future) whipsaws = count_whipsaws(future) if false_breaks >= 2 or whipsaws >= 3: return 2 # manipulation # Distribution criteria if price_end < price_start * 0.98: # Decline >= 2% volume_on_down = check_volume_on_down_moves(future) lower_highs = count_lower_highs(future) if volume_on_down and lower_highs >= 2: return 3 # distribution return 0 # neutral ``` ### Output ```python @dataclass class AMDPrediction: phase: str # 'neutral', 'accumulation', etc. confidence: float # 0-1 probabilities: Dict[str, float] # {'neutral': 0.1, 'accumulation': 0.7, ...} strength: float # 0-1 characteristics: Dict # Phase-specific metrics timestamp: pd.Timestamp # Ejemplo prediction = amd_detector.predict(current_data) # { # 'phase': 'accumulation', # 'confidence': 0.78, # 'probabilities': { # 'neutral': 0.05, # 'accumulation': 0.78, # 'manipulation': 0.12, # 'distribution': 0.05 # }, # 'strength': 0.71, # 'timestamp': '2025-12-05 14:30:00' # } ``` ### M\u00e9tricas de Evaluaci\u00f3n | M\u00e9trica | Target | Actual | |---------|--------|--------| | **Overall Accuracy** | >70% | - | | **Accumulation Precision** | >65% | - | | **Manipulation Precision** | >60% | - | | **Distribution Precision** | >65% | - | | **Macro F1 Score** | >0.65 | - | | **Weighted F1 Score** | >0.70 | - | --- ## Modelo 2: RangePredictor ### Descripci\u00f3n Modelo de regresi\u00f3n que predice delta_high y delta_low para m\u00faltiples horizontes temporales. **Ver implementaci\u00f3n existente:** `[LEGACY: apps/ml-engine - migrado desde TradingAgent]/src/models/range_predictor.py` ### Arquitectura **Tipo:** XGBoost Regressor + Classifier (para bins) **Horizontes:** 15m (3 bars), 1h (12 bars), personalizado ```python class RangePredictor: """ Predice rangos de precio futuros """ def __init__(self, config=None): self.config = config or self._default_config() self.horizons = ['15m', '1h'] self.models = {} # Initialize models for each horizon for horizon in self.horizons: self.models[f'{horizon}_high_reg'] = XGBRegressor(**self.config['xgboost']) self.models[f'{horizon}_low_reg'] = XGBRegressor(**self.config['xgboost']) self.models[f'{horizon}_high_bin'] = XGBClassifier(**self.config['xgboost_classifier']) self.models[f'{horizon}_low_bin'] = XGBClassifier(**self.config['xgboost_classifier']) ``` ### Input Features **Dimensi\u00f3n:** 70+ features (base + AMD) ```python def prepare_range_features(df, amd_features): """ Combina features base con outputs de AMDDetector """ # Base technical features (21 existentes) base_features = extract_technical_features(df) # AMD features (del AMDDetector) amd_enhanced = { 'phase_encoded': encode_phase(amd_features['phase']), 'phase_accumulation_prob': amd_features['probabilities']['accumulation'], 'phase_manipulation_prob': amd_features['probabilities']['manipulation'], 'phase_distribution_prob': amd_features['probabilities']['distribution'], 'phase_strength': amd_features['strength'], 'range_compression': amd_features['characteristics'].get('range_compression', 0), 'order_blocks_net': ( amd_features['characteristics'].get('order_blocks_bullish', 0) - amd_features['characteristics'].get('order_blocks_bearish', 0) ) } # Liquidity features (del LiquidityHunter) liquidity_features = { 'bsl_distance': calculate_bsl_distance(df), 'ssl_distance': calculate_ssl_distance(df), 'liquidity_grab_recent': count_recent_liquidity_grabs(df), 'fvg_count': count_unfilled_fvg(df) } # ICT features ict_features = { 'ote_position': calculate_ote_position(df), 'in_premium_zone': 1 if is_premium_zone(df) else 0, 'in_discount_zone': 1 if is_discount_zone(df) else 0, 'killzone_strength': get_killzone_strength(df), 'weekly_range_position': calculate_weekly_position(df), 'daily_range_position': calculate_daily_position(df) } # SMC features smc_features = { 'choch_bullish_count': count_choch(df, 'bullish'), 'choch_bearish_count': count_choch(df, 'bearish'), 'bos_bullish_count': count_bos(df, 'bullish'), 'bos_bearish_count': count_bos(df, 'bearish'), 'displacement_strength': calculate_displacement(df), 'market_structure_score': calculate_structure_score(df) } # Combine all return pd.DataFrame({ **base_features, **amd_enhanced, **liquidity_features, **ict_features, **smc_features }) ``` ### Targets ```python def calculate_range_targets(df, horizons={'15m': 3, '1h': 12}): """ Calcula targets de rango para entrenamiento """ targets = {} for name, periods in horizons.items(): # Delta high/low targets[f'delta_high_{name}'] = ( df['high'].rolling(periods).max().shift(-periods) - df['close'] ) / df['close'] targets[f'delta_low_{name}'] = ( df['close'] - df['low'].rolling(periods).min().shift(-periods) ) / df['close'] # Bins (clasificaci\u00f3n de volatilidad) atr = calculate_atr(df, 14) def to_bin(delta): if pd.isna(delta): return np.nan ratio = delta / atr if ratio < 0.3: return 0 # Very low elif ratio < 0.7: return 1 # Low elif ratio < 1.2: return 2 # Medium else: return 3 # High targets[f'bin_high_{name}'] = targets[f'delta_high_{name}'].apply(to_bin) targets[f'bin_low_{name}'] = targets[f'delta_low_{name}'].apply(to_bin) return pd.DataFrame(targets) ``` ### Output ```python @dataclass class RangePrediction: horizon: str delta_high: float # Predicted max price increase delta_low: float # Predicted max price decrease delta_high_bin: int # Volatility classification delta_low_bin: int confidence_high: float confidence_low: float predicted_high_price: float # Absolute price predicted_low_price: float timestamp: pd.Timestamp # Ejemplo predictions = range_predictor.predict(features, current_price=89350) # [ # RangePrediction( # horizon='15m', # delta_high=0.0085, # +0.85% # delta_low=0.0042, # -0.42% # predicted_high_price=89,109, # predicted_low_price=88,975, # confidence_high=0.72, # confidence_low=0.68 # ), # RangePrediction(horizon='1h', ...) # ] ``` ### M\u00e9tricas de Evaluaci\u00f3n | Horizonte | MAE High | MAE Low | MAPE | Bin Accuracy | R² | |-----------|----------|---------|------|--------------|-----| | **15m** | <0.003 | <0.003 | <0.5% | >65% | >0.3 | | **1h** | <0.005 | <0.005 | <0.8% | >60% | >0.2 | **Directional Accuracy:** - High predictions: Target >95% - Low predictions: Target >50% (mejorar desde 4-19%) --- ## Modelo 3: TPSLClassifier ### Descripci\u00f3n Clasificador binario que predice la probabilidad de que Take Profit sea alcanzado antes que Stop Loss. **Ver implementaci\u00f3n existente:** `[LEGACY: apps/ml-engine - migrado desde TradingAgent]/src/models/tp_sl_classifier.py` ### Arquitectura **Tipo:** XGBoost Binary Classifier con calibraci\u00f3n **R:R Configs:** M\u00faltiples ratios (2:1, 3:1, personalizado) ```python class TPSLClassifier: """ Predice probabilidad TP antes de SL """ def __init__(self, config=None): self.config = config or self._default_config() self.horizons = ['15m', '1h'] self.rr_configs = [ {'name': 'rr_2_1', 'sl_atr_multiple': 0.3, 'tp_atr_multiple': 0.6}, {'name': 'rr_3_1', 'sl_atr_multiple': 0.3, 'tp_atr_multiple': 0.9}, ] self.models = {} self.calibrated_models = {} # Initialize models for horizon in self.horizons: for rr in self.rr_configs: key = f'{horizon}_{rr["name"]}' self.models[key] = XGBClassifier(**self.config['xgboost']) ``` ### Input Features **Dimensi\u00f3n:** 80+ features (base + AMD + Range predictions) ```python def prepare_tpsl_features(df, amd_features, range_predictions): """ Features para TPSLClassifier incluyen stacking """ # Base + AMD features (igual que RangePredictor) base_features = prepare_range_features(df, amd_features) # Range predictions como features (stacking) range_stacking = { 'pred_delta_high_15m': range_predictions['15m'].delta_high, 'pred_delta_low_15m': range_predictions['15m'].delta_low, 'pred_delta_high_1h': range_predictions['1h'].delta_high, 'pred_delta_low_1h': range_predictions['1h'].delta_low, 'pred_high_confidence': range_predictions['15m'].confidence_high, 'pred_low_confidence': range_predictions['15m'].confidence_low, 'pred_high_low_ratio': ( range_predictions['15m'].delta_high / (range_predictions['15m'].delta_low + 1e-8) ) } # R:R specific features rr_features = { 'atr_current': calculate_atr(df, 14).iloc[-1], 'volatility_regime': classify_volatility_regime(df), 'trend_alignment': check_trend_alignment(df, amd_features), 'liquidity_risk': calculate_liquidity_risk(df), 'manipulation_risk': amd_features['probabilities']['manipulation'] } return pd.DataFrame({ **base_features, **range_stacking, **rr_features }) ``` ### Targets ```python def calculate_tpsl_targets(df, horizons, rr_configs): """ Calcula si TP toca antes de SL """ targets = {} atr = calculate_atr(df, 14) for horizon_name, periods in horizons.items(): for rr in rr_configs: sl_distance = atr * rr['sl_atr_multiple'] tp_distance = atr * rr['tp_atr_multiple'] target_name = f'tp_first_{horizon_name}_{rr["name"]}' def check_tp_first(i): if i + periods >= len(df): return np.nan entry = df['close'].iloc[i] sl_price = entry - sl_distance.iloc[i] tp_price = entry + tp_distance.iloc[i] future = df.iloc[i+1:i+periods+1] # Check which hits first for j, row in future.iterrows(): if row['low'] <= sl_price: return 0 # SL hit first elif row['high'] >= tp_price: return 1 # TP hit first return np.nan # Neither hit targets[target_name] = [check_tp_first(i) for i in range(len(df))] return pd.DataFrame(targets) ``` ### Probability Calibration ```python from sklearn.calibration import CalibratedClassifierCV def calibrate_model(model, X_val, y_val): """ Calibra probabilidades usando isotonic regression """ calibrated = CalibratedClassifierCV( model, method='isotonic', # or 'sigmoid' cv='prefit' ) calibrated.fit(X_val, y_val) return calibrated # Uso tpsl_classifier.models['15m_rr_2_1'].fit(X_train, y_train) tpsl_classifier.calibrated_models['15m_rr_2_1'] = calibrate_model( tpsl_classifier.models['15m_rr_2_1'], X_val, y_val ) ``` ### Output ```python @dataclass class TPSLPrediction: horizon: str rr_config: str prob_tp_first: float # P(TP antes de SL) prob_sl_first: float # 1 - prob_tp_first recommended_action: str # 'long', 'short', 'hold' confidence: float # |prob - 0.5| * 2 entry_price: float sl_price: float tp_price: float expected_value: float # EV calculation timestamp: pd.Timestamp # Ejemplo predictions = tpsl_classifier.predict( features, current_price=89350, direction='long' ) # [ # TPSLPrediction( # horizon='15m', # rr_config='rr_2_1', # prob_tp_first=0.68, # recommended_action='long', # confidence=0.36, # entry_price=89350, # sl_price=89082, # -0.3 ATR # tp_price=89886, # +0.6 ATR # expected_value=0.136 # +13.6% EV # ) # ] ``` ### M\u00e9tricas de Evaluaci\u00f3n | M\u00e9trica | Target | Actual (Phase 2) | |---------|--------|------------------| | **Accuracy** | >80% | 85.9% | | **Precision** | >75% | 82.1% | | **Recall** | >75% | 85.7% | | **F1 Score** | >0.75 | 0.84 | | **ROC-AUC** | >0.85 | 0.94 | --- ## Modelo 4: LiquidityHunter ### Descripci\u00f3n Modelo especializado en detectar zonas de liquidez y predecir movimientos de "stop hunting". ### Arquitectura **Tipo:** XGBoost Binary Classifier **Output:** Probabilidad de liquidity sweep ```python class LiquidityHunter: """ Detecta y predice caza de stops """ def __init__(self, config=None): self.config = config or self._default_config() self.model_bsl = XGBClassifier(**self.config['xgboost']) # Buy-side liquidity self.model_ssl = XGBClassifier(**self.config['xgboost']) # Sell-side liquidity self.scaler = StandardScaler() def _default_config(self): return { 'lookback_swing': 20, # Periodos para swing points 'sweep_threshold': 0.005, # 0.5% beyond level 'xgboost': { 'n_estimators': 200, 'max_depth': 5, 'learning_rate': 0.05, 'scale_pos_weight': 2.0, # Liquidity sweeps son raros 'objective': 'binary:logistic', 'eval_metric': 'auc' } } ``` ### Input Features **Dimensi\u00f3n:** 30 features especializados ```python def extract_liquidity_features(df, lookback=20): """ Features para detecci\u00f3n de liquidez """ features = {} # Identify liquidity pools swing_highs = df['high'].rolling(lookback, center=True).max() swing_lows = df['low'].rolling(lookback, center=True).min() # Distance to liquidity features['bsl_distance'] = (swing_highs - df['close']) / df['close'] features['ssl_distance'] = (df['close'] - swing_lows) / df['close'] # Liquidity density (how many levels nearby) features['bsl_density'] = count_levels_above(df, lookback) features['ssl_density'] = count_levels_below(df, lookback) # Recent sweep history features['bsl_sweeps_recent'] = count_bsl_sweeps(df, window=50) features['ssl_sweeps_recent'] = count_ssl_sweeps(df, window=50) # Volume profile near liquidity features['volume_at_bsl'] = calculate_volume_at_level(df, swing_highs) features['volume_at_ssl'] = calculate_volume_at_level(df, swing_lows) # Market structure features['higher_highs_forming'] = (df['high'] > df['high'].shift(1)).rolling(10).sum() features['lower_lows_forming'] = (df['low'] < df['low'].shift(1)).rolling(10).sum() # Volatility expansion (often precedes sweeps) atr = calculate_atr(df, 14) features['atr_expanding'] = (atr > atr.shift(5)).astype(int) features['volatility_regime'] = classify_volatility(df) # Price proximity to levels features['near_bsl'] = (features['bsl_distance'] < 0.01).astype(int) # Within 1% features['near_ssl'] = (features['ssl_distance'] < 0.01).astype(int) # Time since last sweep features['bars_since_bsl_sweep'] = calculate_bars_since_sweep(df, 'bsl') features['bars_since_ssl_sweep'] = calculate_bars_since_sweep(df, 'ssl') # Manipulation signals features['false_breakouts_recent'] = count_false_breakouts(df, window=30) features['whipsaw_intensity'] = calculate_whipsaw_intensity(df) # AMD phase context features['in_manipulation_phase'] = check_manipulation_phase(df) return pd.DataFrame(features) ``` ### Targets ```python def label_liquidity_sweep(df, i, forward_window=10, sweep_threshold=0.005): """ Etiqueta si habr\u00e1 liquidity sweep """ if i + forward_window >= len(df): return np.nan current_high = df['high'].iloc[max(0, i-20):i].max() current_low = df['low'].iloc[max(0, i-20):i].min() future = df.iloc[i:i+forward_window] # BSL sweep (sweep of highs) bsl_sweep_price = current_high * (1 + sweep_threshold) bsl_swept = (future['high'] >= bsl_sweep_price).any() # SSL sweep (sweep of lows) ssl_sweep_price = current_low * (1 - sweep_threshold) ssl_swept = (future['low'] <= ssl_sweep_price).any() # Return binary targets return { 'bsl_sweep': 1 if bsl_swept else 0, 'ssl_sweep': 1 if ssl_swept else 0, 'any_sweep': 1 if (bsl_swept or ssl_swept) else 0 } ``` ### Output ```python @dataclass class LiquidityPrediction: liquidity_type: str # 'BSL' or 'SSL' sweep_probability: float # 0-1 liquidity_level: float # Price level distance_pct: float # Distance to level density: int # Number of levels nearby expected_timing: int # Bars until sweep risk_score: float # Higher = more likely to be trapped timestamp: pd.Timestamp # Ejemplo prediction = liquidity_hunter.predict(current_data) # [ # LiquidityPrediction( # liquidity_type='BSL', # sweep_probability=0.72, # liquidity_level=89450, # distance_pct=0.0011, # 0.11% away # density=3, # expected_timing=5, # ~5 bars # risk_score=0.68 # High risk of reversal after sweep # ) # ] ``` ### M\u00e9tricas | M\u00e9trica | Target | |---------|--------| | **Precision** | >70% | | **Recall** | >60% | | **ROC-AUC** | >0.75 | | **False Positive Rate** | <30% | --- ## Modelo 5: OrderFlowAnalyzer ### Descripci\u00f3n Analiza el flujo de \u00f3rdenes para detectar acumulaci\u00f3n/distribuci\u00f3n institucional. **Nota:** Modelo opcional - requiere datos de volumen granular ### Arquitectura **Tipo:** LSTM / Transformer (para secuencias temporales) **Output:** Score de acumulaci\u00f3n/distribuci\u00f3n ```python import torch import torch.nn as nn class OrderFlowAnalyzer(nn.Module): """ Analiza flujo de \u00f3rdenes usando LSTM """ def __init__(self, input_dim=10, hidden_dim=64, num_layers=2): super().__init__() self.lstm = nn.LSTM( input_dim, hidden_dim, num_layers, batch_first=True, dropout=0.2 ) self.fc = nn.Sequential( nn.Linear(hidden_dim, 32), nn.ReLU(), nn.Dropout(0.3), nn.Linear(32, 3) # accumulation, neutral, distribution ) def forward(self, x): # x shape: (batch, sequence, features) lstm_out, _ = self.lstm(x) # Take last output last_out = lstm_out[:, -1, :] output = self.fc(last_out) return torch.softmax(output, dim=1) ``` ### Input Features (Secuencia) **Dimensi\u00f3n:** 10 features x 50 timesteps ```python def extract_order_flow_sequence(df, sequence_length=50): """ Extrae secuencia de order flow features """ features = [] for i in range(len(df) - sequence_length + 1): window = df.iloc[i:i+sequence_length] sequence_features = { # Delta de volumen 'volume_delta': window['volume'] - window['volume'].shift(1), # Buy/Sell imbalance 'buy_volume': window['volume'] * (window['close'] > window['open']).astype(int), 'sell_volume': window['volume'] * (window['close'] < window['open']).astype(int), 'imbalance': (window['buy_volume'] - window['sell_volume']) / window['volume'], # Large orders detection 'large_orders': (window['volume'] > window['volume'].rolling(20).mean() * 2).astype(int), # Tick data (si disponible) 'upticks': count_upticks(window), 'downticks': count_downticks(window), 'tick_imbalance': (window['upticks'] - window['downticks']) / (window['upticks'] + window['downticks'] + 1), # Cumulative metrics 'cumulative_delta': (window['buy_volume'] - window['sell_volume']).cumsum(), 'cvd_slope': window['cumulative_delta'].diff(5) / 5 } features.append(pd.DataFrame(sequence_features)) return np.array([f.values for f in features]) ``` ### Output ```python @dataclass class OrderFlowPrediction: flow_type: str # 'accumulation', 'distribution', 'neutral' confidence: float imbalance_score: float # -1 (selling) to +1 (buying) institutional_activity: float # 0-1 large_orders_detected: int cvd_trend: str # 'up', 'down', 'flat' timestamp: pd.Timestamp ``` --- ## Meta-Modelo: StrategyOrchestrator ### Descripci\u00f3n Combina todos los modelos anteriores para generar la se\u00f1al final de trading. ### Arquitectura **Tipo:** Ensemble Weighted + Rule-Based ```python class StrategyOrchestrator: """ Meta-modelo que orquesta todas las predicciones """ def __init__(self, models, config=None): self.amd_detector = models['amd_detector'] self.range_predictor = models['range_predictor'] self.tpsl_classifier = models['tpsl_classifier'] self.liquidity_hunter = models['liquidity_hunter'] self.order_flow_analyzer = models.get('order_flow_analyzer') self.config = config or self._default_config() self.weights = self.config['weights'] def _default_config(self): return { 'weights': { 'amd': 0.30, 'range': 0.25, 'tpsl': 0.25, 'liquidity': 0.15, 'order_flow': 0.05 }, 'min_confidence': 0.60, 'min_tp_probability': 0.55, 'risk_multiplier': 0.02 # 2% risk per trade } def generate_signal(self, market_data, current_price): """ Genera se\u00f1al de trading combinando todos los modelos """ signal = { 'action': 'hold', 'confidence': 0.0, 'entry_price': current_price, 'stop_loss': None, 'take_profit': None, 'position_size': 0.0, 'reasoning': [], 'model_outputs': {} } # 1. AMD Phase amd_pred = self.amd_detector.predict(market_data) signal['model_outputs']['amd'] = amd_pred if amd_pred['confidence'] < 0.6: signal['reasoning'].append('Low AMD confidence - avoiding trade') return signal # 2. Range Prediction range_pred = self.range_predictor.predict(market_data, current_price) signal['model_outputs']['range'] = range_pred # 3. TPSL Probability tpsl_pred = self.tpsl_classifier.predict(market_data, current_price) signal['model_outputs']['tpsl'] = tpsl_pred # 4. Liquidity Analysis liq_pred = self.liquidity_hunter.predict(market_data) signal['model_outputs']['liquidity'] = liq_pred # 5. Order Flow (if available) if self.order_flow_analyzer: flow_pred = self.order_flow_analyzer.predict(market_data) signal['model_outputs']['order_flow'] = flow_pred # === DECISION LOGIC === # Determine bias from AMD if amd_pred['phase'] == 'accumulation': bias = 'bullish' signal['reasoning'].append(f'AMD: Accumulation phase (conf: {amd_pred["confidence"]:.2%})') elif amd_pred['phase'] == 'distribution': bias = 'bearish' signal['reasoning'].append(f'AMD: Distribution phase (conf: {amd_pred["confidence"]:.2%})') elif amd_pred['phase'] == 'manipulation': signal['reasoning'].append('AMD: Manipulation phase - avoiding entry') return signal else: signal['reasoning'].append('AMD: Neutral phase - no clear direction') return signal # Check range prediction alignment if bias == 'bullish': range_alignment = range_pred['15m'].delta_high > range_pred['15m'].delta_low * 1.5 else: range_alignment = range_pred['15m'].delta_low > range_pred['15m'].delta_high * 1.5 if not range_alignment: signal['reasoning'].append('Range prediction does not align with bias') return signal signal['reasoning'].append('Range prediction aligned') # Check TPSL probability relevant_tpsl = [p for p in tpsl_pred if p.recommended_action == bias.replace('ish', '')] if not relevant_tpsl or relevant_tpsl[0].prob_tp_first < self.config['min_tp_probability']: signal['reasoning'].append(f'Low TP probability: {relevant_tpsl[0].prob_tp_first:.2%}') return signal signal['reasoning'].append(f'High TP probability: {relevant_tpsl[0].prob_tp_first:.2%}') # Check liquidity risk if liq_pred: liquidity_risk = any(p.sweep_probability > 0.7 and p.distance_pct < 0.005 for p in liq_pred) if liquidity_risk: signal['reasoning'].append('High liquidity sweep risk nearby') # Reduce position size position_multiplier = 0.5 else: position_multiplier = 1.0 else: position_multiplier = 1.0 # === CALCULATE CONFIDENCE === confidence_score = 0.0 # AMD contribution confidence_score += self.weights['amd'] * amd_pred['confidence'] # Range contribution range_conf = (range_pred['15m'].confidence_high + range_pred['15m'].confidence_low) / 2 confidence_score += self.weights['range'] * range_conf # TPSL contribution tpsl_conf = relevant_tpsl[0].confidence confidence_score += self.weights['tpsl'] * tpsl_conf # Liquidity contribution if liq_pred: liq_conf = 1 - max(p.risk_score for p in liq_pred) # Inverse of risk confidence_score += self.weights['liquidity'] * liq_conf signal['confidence'] = confidence_score if confidence_score < self.config['min_confidence']: signal['reasoning'].append(f'Overall confidence too low: {confidence_score:.2%}') return signal # === GENERATE ENTRY === signal['action'] = 'long' if bias == 'bullish' else 'short' signal['entry_price'] = current_price # Use TPSL predictions tpsl_entry = relevant_tpsl[0] signal['stop_loss'] = tpsl_entry.sl_price signal['take_profit'] = tpsl_entry.tp_price # Calculate position size account_risk = self.config['risk_multiplier'] # 2% of account price_risk = abs(current_price - tpsl_entry.sl_price) / current_price signal['position_size'] = (account_risk / price_risk) * position_multiplier signal['reasoning'].append(f'Signal generated: {signal["action"].upper()}') signal['reasoning'].append(f'Confidence: {confidence_score:.2%}') signal['reasoning'].append(f'R:R: {(abs(tpsl_entry.tp_price - current_price) / abs(current_price - tpsl_entry.sl_price)):.2f}:1') return signal ``` ### Pipeline de Decisi\u00f3n ``` Market Data │ ▼ ┌─────────────┐ │ AMDDetector │──── Phase = Accumulation? ──────┐ └─────────────┘ Confidence > 0.6? │ NO → HOLD YES │ ▼ ┌─────────────────┐ │ RangePredictor │ └────────┬────────┘ │ ΔHigh > ΔLow * 1.5? │ YES │ ▼ ┌─────────────────┐ │TPSLClassifier │ └────────┬────────┘ │ P(TP first) > 0.55? │ YES │ ▼ ┌─────────────────┐ │LiquidityHunter │ └────────┬────────┘ │ Sweep risk low? │ YES │ ▼ ┌─────────────────┐ │ Confidence │ │ Calculation │ └────────┬────────┘ │ Total > 0.60? │ YES │ ▼ ┌─────────────────┐ │ LONG SIGNAL │ │ Entry, SL, TP │ └─────────────────┘ ``` ### Output ```python @dataclass class TradingSignal: action: str # 'long', 'short', 'hold' confidence: float # 0-1 entry_price: float stop_loss: float take_profit: float position_size: float # Units or % of account risk_reward_ratio: float expected_value: float # EV calculation reasoning: List[str] # Why this signal model_outputs: Dict # All model predictions timestamp: pd.Timestamp # Metadata symbol: str horizon: str amd_phase: str killzone: str # Ejemplo completo signal = orchestrator.generate_signal(market_data, current_price=89350) # TradingSignal( # action='long', # confidence=0.73, # entry_price=89350, # stop_loss=89082, # take_profit=89886, # position_size=0.15, # 15% of account # risk_reward_ratio=2.0, # expected_value=0.214, # +21.4% EV # reasoning=[ # 'AMD: Accumulation phase (conf: 78%)', # 'Range prediction aligned', # 'High TP probability: 68%', # 'Signal generated: LONG', # 'Confidence: 73%', # 'R:R: 2.00:1' # ], # amd_phase='accumulation', # killzone='ny_am' # ) ``` --- ## Pipeline de Entrenamiento ### Workflow Completo ```python class MLTrainingPipeline: """ Pipeline completo de entrenamiento """ def __init__(self, data_path, config): self.data_path = data_path self.config = config self.models = {} def run(self): """Ejecuta pipeline completo""" # 1. Load & prepare data print("1. Loading data...") df = self.load_data() # 2. Feature engineering print("2. Engineering features...") features = self.engineer_features(df) # 3. Target labeling print("3. Labeling targets...") targets = self.label_targets(df) # 4. Train-test split (temporal) print("4. Splitting data...") X_train, X_val, X_test, y_train, y_val, y_test = self.temporal_split( features, targets ) # 5. Train AMDDetector print("5. Training AMDDetector...") self.models['amd_detector'] = self.train_amd_detector( X_train, y_train['amd'], X_val, y_val['amd'] ) # 6. Generate AMD features for next models print("6. Generating AMD features...") amd_features_train = self.models['amd_detector'].predict_proba(X_train) amd_features_val = self.models['amd_detector'].predict_proba(X_val) # 7. Train RangePredictor print("7. Training RangePredictor...") X_range_train = np.hstack([X_train, amd_features_train]) X_range_val = np.hstack([X_val, amd_features_val]) self.models['range_predictor'] = self.train_range_predictor( X_range_train, y_train['range'], X_range_val, y_val['range'] ) # 8. Generate range predictions for TPSL print("8. Generating range predictions...") range_preds_train = self.models['range_predictor'].predict(X_range_train) range_preds_val = self.models['range_predictor'].predict(X_range_val) # 9. Train TPSLClassifier print("9. Training TPSLClassifier...") X_tpsl_train = np.hstack([X_range_train, range_preds_train]) X_tpsl_val = np.hstack([X_range_val, range_preds_val]) self.models['tpsl_classifier'] = self.train_tpsl_classifier( X_tpsl_train, y_train['tpsl'], X_tpsl_val, y_val['tpsl'] ) # 10. Train LiquidityHunter print("10. Training LiquidityHunter...") self.models['liquidity_hunter'] = self.train_liquidity_hunter( X_train, y_train['liquidity'], X_val, y_val['liquidity'] ) # 11. Evaluate all models print("11. Evaluating models...") self.evaluate_all(X_test, y_test) # 12. Save models print("12. Saving models...") self.save_all_models() print("Training complete!") return self.models def temporal_split(self, features, targets, train_pct=0.7, val_pct=0.15): """Split temporal (sin shuffle)""" n = len(features) train_end = int(n * train_pct) val_end = int(n * (train_pct + val_pct)) return ( features[:train_end], features[train_end:val_end], features[val_end:], targets[:train_end], targets[train_end:val_end], targets[val_end:] ) ``` ### Cross-Validation Temporal ```python from sklearn.model_selection import TimeSeriesSplit def temporal_cross_validation(model, X, y, n_splits=5): """ Cross-validation respetando orden temporal """ tscv = TimeSeriesSplit(n_splits=n_splits) scores = [] for fold, (train_idx, val_idx) in enumerate(tscv.split(X)): print(f"Fold {fold + 1}/{n_splits}") X_train, X_val = X[train_idx], X[val_idx] y_train, y_val = y[train_idx], y[val_idx] # Train model.fit(X_train, y_train) # Evaluate y_pred = model.predict(X_val) score = accuracy_score(y_val, y_pred) scores.append(score) print(f" Accuracy: {score:.4f}") print(f"\nMean Accuracy: {np.mean(scores):.4f} ± {np.std(scores):.4f}") return scores ``` --- ## M\u00e9tricas y Evaluaci\u00f3n ### M\u00e9tricas por Modelo ```python class ModelEvaluator: """ Evaluaci\u00f3n completa de modelos """ @staticmethod def evaluate_amd_detector(model, X_test, y_test): """Evaluar AMDDetector""" y_pred = model.predict(X_test) y_pred_proba = model.predict_proba(X_test) metrics = { 'accuracy': accuracy_score(y_test, y_pred), 'macro_f1': f1_score(y_test, y_pred, average='macro'), 'weighted_f1': f1_score(y_test, y_pred, average='weighted'), 'classification_report': classification_report(y_test, y_pred), 'confusion_matrix': confusion_matrix(y_test, y_pred) } # Per-class metrics for class_idx, class_name in model.label_encoder.items(): mask = y_test == class_idx if mask.sum() > 0: metrics[f'{class_name}_precision'] = precision_score( y_test == class_idx, y_pred == class_idx ) metrics[f'{class_name}_recall'] = recall_score( y_test == class_idx, y_pred == class_idx ) return metrics @staticmethod def evaluate_range_predictor(model, X_test, y_test): """Evaluar RangePredictor""" predictions = model.predict(X_test) metrics = {} for horizon in ['15m', '1h']: for target_type in ['high', 'low']: y_true = y_test[f'delta_{target_type}_{horizon}'] y_pred = [p.delta_high if target_type == 'high' else p.delta_low for p in predictions if p.horizon == horizon] metrics[f'{horizon}_{target_type}_mae'] = mean_absolute_error(y_true, y_pred) metrics[f'{horizon}_{target_type}_rmse'] = np.sqrt(mean_squared_error(y_true, y_pred)) metrics[f'{horizon}_{target_type}_r2'] = r2_score(y_true, y_pred) # Directional accuracy direction_true = np.sign(y_true) direction_pred = np.sign(y_pred) metrics[f'{horizon}_{target_type}_directional_acc'] = ( direction_true == direction_pred ).mean() return metrics @staticmethod def evaluate_tpsl_classifier(model, X_test, y_test): """Evaluar TPSLClassifier""" metrics = {} for horizon in ['15m', '1h']: for rr in ['rr_2_1', 'rr_3_1']: target_key = f'tp_first_{horizon}_{rr}' y_true = y_test[target_key].dropna() if len(y_true) == 0: continue X_valid = X_test[y_test[target_key].notna()] y_pred = model.predict_proba(X_valid, horizon, rr) y_pred_class = (y_pred > 0.5).astype(int) metrics[f'{horizon}_{rr}_accuracy'] = accuracy_score(y_true, y_pred_class) metrics[f'{horizon}_{rr}_roc_auc'] = roc_auc_score(y_true, y_pred) metrics[f'{horizon}_{rr}_precision'] = precision_score(y_true, y_pred_class) metrics[f'{horizon}_{rr}_recall'] = recall_score(y_true, y_pred_class) metrics[f'{horizon}_{rr}_f1'] = f1_score(y_true, y_pred_class) return metrics ``` ### Backtesting de Señales ```python class SignalBacktester: """ Backtesting de se\u00f1ales generadas """ def __init__(self, initial_capital=10000): self.initial_capital = initial_capital self.capital = initial_capital self.trades = [] self.equity_curve = [] def run(self, df, signals): """Ejecuta backtest""" position = None for i, signal in enumerate(signals): if signal['action'] == 'hold': continue # Entry if position is None and signal['action'] in ['long', 'short']: position = { 'type': signal['action'], 'entry_price': signal['entry_price'], 'entry_time': signal['timestamp'], 'stop_loss': signal['stop_loss'], 'take_profit': signal['take_profit'], 'size': signal['position_size'] } # Check exit if position is not None: # Simulate price movement future_bars = df[df.index > signal['timestamp']].head(100) for idx, row in future_bars.iterrows(): # Check SL if position['type'] == 'long' and row['low'] <= position['stop_loss']: self._close_position(position, position['stop_loss'], idx, 'SL') position = None break # Check TP elif position['type'] == 'long' and row['high'] >= position['take_profit']: self._close_position(position, position['take_profit'], idx, 'TP') position = None break self.equity_curve.append(self.capital) return self._calculate_metrics() def _close_position(self, position, exit_price, exit_time, exit_reason): """Cierra posici\u00f3n""" if position['type'] == 'long': pnl = (exit_price - position['entry_price']) / position['entry_price'] else: pnl = (position['entry_price'] - exit_price) / position['entry_price'] pnl_amount = self.capital * position['size'] * pnl self.capital += pnl_amount self.trades.append({ 'type': position['type'], 'entry_price': position['entry_price'], 'exit_price': exit_price, 'entry_time': position['entry_time'], 'exit_time': exit_time, 'exit_reason': exit_reason, 'pnl_pct': pnl * 100, 'pnl_amount': pnl_amount }) def _calculate_metrics(self): """Calcula m\u00e9tricas de performance""" if not self.trades: return {} trades_df = pd.DataFrame(self.trades) total_return = (self.capital - self.initial_capital) / self.initial_capital num_trades = len(trades_df) num_wins = (trades_df['pnl_pct'] > 0).sum() num_losses = (trades_df['pnl_pct'] < 0).sum() win_rate = num_wins / num_trades if num_trades > 0 else 0 avg_win = trades_df[trades_df['pnl_pct'] > 0]['pnl_pct'].mean() if num_wins > 0 else 0 avg_loss = trades_df[trades_df['pnl_pct'] < 0]['pnl_pct'].mean() if num_losses > 0 else 0 # Sharpe ratio returns = pd.Series(self.equity_curve).pct_change().dropna() sharpe = np.sqrt(252) * (returns.mean() / returns.std()) if returns.std() > 0 else 0 # Max drawdown equity_series = pd.Series(self.equity_curve) cummax = equity_series.cummax() drawdown = (equity_series - cummax) / cummax max_drawdown = drawdown.min() return { 'total_return_pct': total_return * 100, 'final_capital': self.capital, 'num_trades': num_trades, 'num_wins': num_wins, 'num_losses': num_losses, 'win_rate': win_rate * 100, 'avg_win_pct': avg_win, 'avg_loss_pct': avg_loss, 'profit_factor': abs(avg_win * num_wins / (avg_loss * num_losses)) if num_losses > 0 else np.inf, 'sharpe_ratio': sharpe, 'max_drawdown_pct': max_drawdown * 100 } ``` --- ## Producci\u00f3n y Deployment ### FastAPI Service ```python from fastapi import FastAPI, HTTPException from pydantic import BaseModel app = FastAPI(title="Trading Platform ML Service") # Load models orchestrator = StrategyOrchestrator.load('models/orchestrator_v1.pkl') class PredictionRequest(BaseModel): symbol: str timeframe: str = '5m' include_reasoning: bool = True class PredictionResponse(BaseModel): signal: TradingSignal metadata: Dict @app.post("/api/signal") async def get_trading_signal(request: PredictionRequest): """ Genera se\u00f1al de trading """ try: # Fetch market data market_data = fetch_market_data(request.symbol, request.timeframe) # Generate signal signal = orchestrator.generate_signal( market_data, current_price=market_data['close'].iloc[-1] ) return PredictionResponse( signal=signal, metadata={ 'model_version': '1.0.0', 'latency_ms': 45, 'timestamp': datetime.now().isoformat() } ) except Exception as e: raise HTTPException(status_code=500, detail=str(e)) @app.get("/api/health") async def health_check(): return { 'status': 'healthy', 'models_loaded': True, 'version': '1.0.0' } ``` ### Monitoring ```python import prometheus_client as prom # Metrics prediction_counter = prom.Counter('ml_predictions_total', 'Total predictions') prediction_latency = prom.Histogram('ml_prediction_latency_seconds', 'Prediction latency') model_accuracy = prom.Gauge('ml_model_accuracy', 'Model accuracy', ['model_name']) @prediction_latency.time() def generate_signal_monitored(data): prediction_counter.inc() signal = orchestrator.generate_signal(data) return signal ``` ### Retraining Pipeline ```python class AutoRetrainingPipeline: """ Pipeline de reentrenamiento autom\u00e1tico """ def __init__(self, schedule='weekly'): self.schedule = schedule self.performance_threshold = 0.70 def should_retrain(self): """Determina si es necesario reentrenar""" # Check recent performance recent_accuracy = self.get_recent_accuracy() if recent_accuracy < self.performance_threshold: return True, 'Performance degradation' # Check data drift drift_detected = self.detect_data_drift() if drift_detected: return True, 'Data drift detected' return False, None def execute_retraining(self): """Ejecuta reentrenamiento""" print("Starting retraining...") # Fetch new data new_data = self.fetch_latest_data() # Retrain all models pipeline = MLTrainingPipeline(new_data, self.config) new_models = pipeline.run() # Validate new models if self.validate_new_models(new_models): # Deploy new models self.deploy_models(new_models) print("Retraining complete. New models deployed.") else: print("Validation failed. Keeping old models.") ``` --- **Documento Generado:** 2025-12-05 **Pr\u00f3xima Revisi\u00f3n:** 2025-Q1 **Contacto:** ml-engineering@trading.ai