trading-platform/docs/02-definicion-modulos/OQI-006-ml-signals/README.md

---
id: "README"
title: "Senales ML y Predicciones"
type: "Documentation"
project: "trading-platform"
version: "1.0.0"
updated_date: "2026-02-06"
---

# OQI-006: Senales ML y Predicciones

**Estado:** ✅ Implementado
**Fecha:** 2025-12-05
**Modulo:** `apps/ml-services`

---

## Descripcion

Sistema de prediccion de precios basado en XGBoost que predice:
- Precio maximo esperado en horizonte temporal
- Precio minimo esperado en horizonte temporal
- Nivel de confianza de la prediccion

---

## Arquitectura

```
┌─────────────────┐     ┌─────────────────────────────────────────┐
│   Binance API   │────▶│          ML SERVICES (FastAPI)          │
│   (Market Data) │     │                Puerto 8000               │
└─────────────────┘     │                                          │
                        │  ┌──────────────┐  ┌──────────────┐     │
                        │  │ MarketData   │  │  XGBoost     │     │
                        │  │   Fetcher    │──│  Predictor   │     │
                        │  └──────────────┘  └──────────────┘     │
                        │         │                 │              │
                        │         ▼                 ▼              │
                        │  ┌──────────────┐  ┌──────────────┐     │
                        │  │   Feature    │  │   Training   │     │
                        │  │ Engineering  │  │   Pipeline   │     │
                        │  └──────────────┘  └──────────────┘     │
                        └─────────────────────────────────────────┘
```

---

## Endpoints

| Metodo | Ruta | Descripcion |
|--------|------|-------------|
| GET | `/health` | Health check |
| GET | `/api/stats` | Estado del servicio |
| GET | `/api/predict/{symbol}` | Predicciones de precio |
| POST | `/api/train/{symbol}` | Entrenar modelo |
| GET | `/api/training/status` | Estado del entrenamiento |
| GET | `/api/signals/{symbol}` | Senales de trading |
| GET | `/api/indicators/{symbol}` | Indicadores tecnicos |
| WS | `/ws/{symbol}` | Predicciones en tiempo real |

---

## Modelo XGBoost

### Configuracion

```python
@dataclass
class ModelConfig:
    n_estimators: int = 100    # Numero de arboles
    max_depth: int = 6         # Profundidad maxima
    learning_rate: float = 0.1 # Tasa de aprendizaje
    subsample: float = 0.8     # Submuestra por arbol
    colsample_bytree: float = 0.8
    min_child_weight: int = 1
    random_state: int = 42
```

### Features (30+)

**Volatilidad:**
- `volatility_5`, `volatility_10`, `volatility_20`, `volatility_50`
- `atr_5`, `atr_10`, `atr_20`, `atr_50`

**Momentum:**
- `momentum_5`, `momentum_10`, `momentum_20`
- `roc_5`, `roc_10`, `roc_20`

**Medias Moviles:**
- `sma_5`, `sma_10`, `sma_20`, `sma_50`
- `ema_5`, `ema_10`, `ema_20`, `ema_50`
- `sma_ratio_5`, `sma_ratio_10`, `sma_ratio_20`, `sma_ratio_50`

**Indicadores:**
- `rsi_14` - Relative Strength Index
- `macd`, `macd_signal`, `macd_histogram`
- `bb_position` - Posicion en Bollinger Bands

**Volumen:**
- `volume_ratio` - Ratio vs SMA 20

**High/Low:**
- `hl_range_pct` - Rango high-low como %
- `high_distance`, `low_distance`
- `hist_max_ratio_*`, `hist_min_ratio_*`

---

## Targets

El modelo predice:

1. **max_ratio**: Ratio del maximo futuro respecto al precio actual
   ```
   max_ratio = future_high / current_price - 1
   ```

2. **min_ratio**: Ratio del minimo futuro respecto al precio actual
   ```
   min_ratio = 1 - future_low / current_price
   ```

---

## Horizontes de Prediccion

| Horizonte | Candles (5min) | Tiempo | Uso |
|-----------|----------------|--------|-----|
| Scalping | 6 | 30 min | Trading rapido |
| Intraday | 18 | 90 min | Day trading |
| Swing | 36 | 3 horas | Swing trading |
| Position | 72 | 6 horas | Posiciones largas |

---

## Metricas de Entrenamiento

| Metrica | Descripcion | Valor Tipico |
|---------|-------------|--------------|
| high_mae | Error absoluto medio (high) | 0.1% - 2% |
| high_rmse | Error cuadratico medio (high) | 0.15% - 2.5% |
| low_mae | Error absoluto medio (low) | 0.1% - 2% |
| low_rmse | Error cuadratico medio (low) | 0.15% - 2.5% |

---

## Ejemplo de Prediccion

### Request

```bash
curl http://localhost:8000/api/predict/BTCUSDT?horizon=all
```

### Response

```json
{
  "symbol": "BTCUSDT",
  "timestamp": "2025-12-05T18:05:08.889327Z",
  "current_price": 89388.99,
  "predictions": {
    "scalping": {
      "high": 89663.86,
      "low": 88930.53,
      "high_ratio": 1.0031,
      "low_ratio": 0.9949,
      "confidence": 0.69,
      "minutes": 30
    },
    "intraday": {
      "high": 90213.60,
      "low": 88013.61,
      "high_ratio": 1.0093,
      "low_ratio": 0.9848,
      "confidence": 0.59,
      "minutes": 90
    },
    "swing": {
      "high": 91038.21,
      "low": 86638.23,
      "high_ratio": 1.0187,
      "low_ratio": 0.9698,
      "confidence": 0.45,
      "minutes": 180
    },
    "position": {
      "high": 92687.43,
      "low": 83887.47,
      "high_ratio": 1.0378,
      "low_ratio": 0.9405,
      "confidence": 0.45,
      "minutes": 360
    }
  },
  "model_version": "1.0.0",
  "is_trained": true
}
```

---

## Entrenamiento

### Iniciar Entrenamiento

```bash
curl -X POST "http://localhost:8000/api/train/BTCUSDT?samples=500"
```

### Respuesta

```json
{
  "status": "training_started",
  "symbol": "BTCUSDT",
  "samples": 500,
  "message": "Model training started in background. Check /api/stats for progress."
}
```

### Verificar Estado

```bash
curl http://localhost:8000/api/training/status
```

```json
{
  "training_in_progress": false,
  "is_trained": true,
  "last_training": {
    "symbol": "BTCUSDT",
    "timestamp": "2025-12-05T18:04:49.757994",
    "samples": 500,
    "metrics": {
      "high_mae": 0.00099,
      "high_rmse": 0.00141,
      "low_mae": 0.00173,
      "low_rmse": 0.00284,
      "train_samples": 355,
      "test_samples": 89
    }
  }
}
```

---

## Market Data

### Fuentes de Datos

| Fuente | Uso | API |
|--------|-----|-----|
| Binance | Crypto (BTC, ETH) | REST + WebSocket |
| Mock Data | Testing | Generado localmente |

### OHLCV Structure

```python
@dataclass
class OHLCV:
    timestamp: np.ndarray  # Epoch milliseconds
    open: np.ndarray       # Precio apertura
    high: np.ndarray       # Precio maximo
    low: np.ndarray        # Precio minimo
    close: np.ndarray      # Precio cierre
    volume: np.ndarray     # Volumen
```

---

## Archivos

```
apps/ml-services/
├── src/
│   ├── api/
│   │   ├── server.py          # FastAPI app
│   │   └── schemas/
│   │       └── prediction.py  # Pydantic schemas
│   ├── data/
│   │   └── market_data.py     # MarketDataFetcher
│   └── models/
│       ├── xgboost_model.py   # XGBoostPredictor
│       ├── predictor.py       # MaxMinPricePredictor
│       └── indicators.py      # Indicadores tecnicos
├── trained_models/            # Modelos guardados
│   ├── xgb_high.json
│   └── xgb_low.json
└── environment-cpu.yml        # Conda environment
```

---

## Dependencias

```yaml
# Principales
- python=3.11
- fastapi=0.115
- uvicorn
- xgboost
- scikit-learn
- pandas
- numpy
- loguru
- aiohttp
- requests
```

---

## Configuracion

### Variables de Entorno

No requiere variables de entorno obligatorias.

Opcionales:
```env
ML_MODEL_PATH=./trained_models
BINANCE_API_KEY=xxx  # Opcional para rate limits
```

### Iniciar Servidor

```bash
cd apps/ml-services
conda activate trading-ml
uvicorn src.api.server:app --host 0.0.0.0 --port 8000 --reload
```

---

## Limitaciones

1. **Simbolos soportados:** Solo BTCUSDT y ETHUSDT para training
2. **Horizonte maximo:** 6 horas (72 candles de 5min)
3. **Rate limits Binance:** 1200 requests/min
4. **Precision:** MAE tipico de 0.1% a 2%

---

## Proximas Mejoras

- [ ] Modelo GRU para patrones secuenciales
- [ ] Ensemble XGBoost + GRU
- [ ] Soporte para mas simbolos (XAU, EUR)
- [ ] Predicciones a nivel de ticks
- [ ] AutoML para optimizacion de hiperparametros

---

## Schemas DDL Asignados

Este modulo es owner del siguiente schema DDL:

| Schema | Tablas | Descripcion |
|--------|--------|-------------|
| **ml** | 12 | models, model_versions, predictions, signals, signal_subscriptions, backtests, backtest_results, feature_sets, training_jobs, ensemble_models, ensemble_predictions, model_metrics |

**Total tablas:** 12
**Nota DDL drift:** Documentacion previa no incluia seccion de schemas DDL. Las 12 tablas cubren el ciclo completo de ML: entrenamiento (models, model_versions, training_jobs, feature_sets), prediccion (predictions, signals, signal_subscriptions), evaluacion (backtests, backtest_results, model_metrics) y ensemble (ensemble_models, ensemble_predictions). Actualizado por TASK-2026-02-06 F2.6.

---

*Documentacion generada: 2025-12-05*