trading-platform/docs/02-definicion-modulos/OQI-006-ml-signals/README.md
Adrian Flores Cortes 8f0235c096 [TASK-2026-02-06-ANALISIS-INTEGRAL-DOCUMENTACION] docs: Complete 6-phase documentation analysis
- FASE-0: Diagnostic audit of 500+ files, 33 findings cataloged (7P0/8P1/12P2/6P3)
- FASE-1: Resolved 7 P0 critical conflicts (ports, paths, dedup OQI-010/ADR-002, orphan schemas)
- FASE-2: Resolved 8 P1 issues (traces, README/CLAUDE.md, DEPENDENCY-GRAPH v2.0, DDL drift, stack versions, DoR/DoD)
- FASE-3: Resolved 12 P2 issues (archived tasks indexed, RNFs created, OQI-010 US/RF/ET, AGENTS v2.0)
- FASE-4: Purged 3 obsolete docs to _archive/, fixed MODELO-NEGOCIO.md broken ref
- FASE-5: Cross-layer validation (DDL→OQI 66%, OQI→BE 72%, BE→FE 78%, Inventories 95%)
- FASE-6: INFORME-FINAL, SA-INDEX (18 subagents), METADATA COMPLETED

27/33 findings resolved (82%), 6 P3 deferred to backlog.
18 new files created, 40+ modified, 4 archived.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 10:57:03 -06:00

371 lines
9.2 KiB
Markdown

---
id: "README"
title: "Senales ML y Predicciones"
type: "Documentation"
project: "trading-platform"
version: "1.0.0"
updated_date: "2026-02-06"
---
# OQI-006: Senales ML y Predicciones
**Estado:** ✅ Implementado
**Fecha:** 2025-12-05
**Modulo:** `apps/ml-services`
---
## Descripcion
Sistema de prediccion de precios basado en XGBoost que predice:
- Precio maximo esperado en horizonte temporal
- Precio minimo esperado en horizonte temporal
- Nivel de confianza de la prediccion
---
## Arquitectura
```
┌─────────────────┐ ┌─────────────────────────────────────────┐
│ Binance API │────▶│ ML SERVICES (FastAPI) │
│ (Market Data) │ │ Puerto 8000 │
└─────────────────┘ │ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ MarketData │ │ XGBoost │ │
│ │ Fetcher │──│ Predictor │ │
│ └──────────────┘ └──────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Feature │ │ Training │ │
│ │ Engineering │ │ Pipeline │ │
│ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────┘
```
---
## Endpoints
| Metodo | Ruta | Descripcion |
|--------|------|-------------|
| GET | `/health` | Health check |
| GET | `/api/stats` | Estado del servicio |
| GET | `/api/predict/{symbol}` | Predicciones de precio |
| POST | `/api/train/{symbol}` | Entrenar modelo |
| GET | `/api/training/status` | Estado del entrenamiento |
| GET | `/api/signals/{symbol}` | Senales de trading |
| GET | `/api/indicators/{symbol}` | Indicadores tecnicos |
| WS | `/ws/{symbol}` | Predicciones en tiempo real |
---
## Modelo XGBoost
### Configuracion
```python
@dataclass
class ModelConfig:
n_estimators: int = 100 # Numero de arboles
max_depth: int = 6 # Profundidad maxima
learning_rate: float = 0.1 # Tasa de aprendizaje
subsample: float = 0.8 # Submuestra por arbol
colsample_bytree: float = 0.8
min_child_weight: int = 1
random_state: int = 42
```
### Features (30+)
**Volatilidad:**
- `volatility_5`, `volatility_10`, `volatility_20`, `volatility_50`
- `atr_5`, `atr_10`, `atr_20`, `atr_50`
**Momentum:**
- `momentum_5`, `momentum_10`, `momentum_20`
- `roc_5`, `roc_10`, `roc_20`
**Medias Moviles:**
- `sma_5`, `sma_10`, `sma_20`, `sma_50`
- `ema_5`, `ema_10`, `ema_20`, `ema_50`
- `sma_ratio_5`, `sma_ratio_10`, `sma_ratio_20`, `sma_ratio_50`
**Indicadores:**
- `rsi_14` - Relative Strength Index
- `macd`, `macd_signal`, `macd_histogram`
- `bb_position` - Posicion en Bollinger Bands
**Volumen:**
- `volume_ratio` - Ratio vs SMA 20
**High/Low:**
- `hl_range_pct` - Rango high-low como %
- `high_distance`, `low_distance`
- `hist_max_ratio_*`, `hist_min_ratio_*`
---
## Targets
El modelo predice:
1. **max_ratio**: Ratio del maximo futuro respecto al precio actual
```
max_ratio = future_high / current_price - 1
```
2. **min_ratio**: Ratio del minimo futuro respecto al precio actual
```
min_ratio = 1 - future_low / current_price
```
---
## Horizontes de Prediccion
| Horizonte | Candles (5min) | Tiempo | Uso |
|-----------|----------------|--------|-----|
| Scalping | 6 | 30 min | Trading rapido |
| Intraday | 18 | 90 min | Day trading |
| Swing | 36 | 3 horas | Swing trading |
| Position | 72 | 6 horas | Posiciones largas |
---
## Metricas de Entrenamiento
| Metrica | Descripcion | Valor Tipico |
|---------|-------------|--------------|
| high_mae | Error absoluto medio (high) | 0.1% - 2% |
| high_rmse | Error cuadratico medio (high) | 0.15% - 2.5% |
| low_mae | Error absoluto medio (low) | 0.1% - 2% |
| low_rmse | Error cuadratico medio (low) | 0.15% - 2.5% |
---
## Ejemplo de Prediccion
### Request
```bash
curl http://localhost:8000/api/predict/BTCUSDT?horizon=all
```
### Response
```json
{
"symbol": "BTCUSDT",
"timestamp": "2025-12-05T18:05:08.889327Z",
"current_price": 89388.99,
"predictions": {
"scalping": {
"high": 89663.86,
"low": 88930.53,
"high_ratio": 1.0031,
"low_ratio": 0.9949,
"confidence": 0.69,
"minutes": 30
},
"intraday": {
"high": 90213.60,
"low": 88013.61,
"high_ratio": 1.0093,
"low_ratio": 0.9848,
"confidence": 0.59,
"minutes": 90
},
"swing": {
"high": 91038.21,
"low": 86638.23,
"high_ratio": 1.0187,
"low_ratio": 0.9698,
"confidence": 0.45,
"minutes": 180
},
"position": {
"high": 92687.43,
"low": 83887.47,
"high_ratio": 1.0378,
"low_ratio": 0.9405,
"confidence": 0.45,
"minutes": 360
}
},
"model_version": "1.0.0",
"is_trained": true
}
```
---
## Entrenamiento
### Iniciar Entrenamiento
```bash
curl -X POST "http://localhost:8000/api/train/BTCUSDT?samples=500"
```
### Respuesta
```json
{
"status": "training_started",
"symbol": "BTCUSDT",
"samples": 500,
"message": "Model training started in background. Check /api/stats for progress."
}
```
### Verificar Estado
```bash
curl http://localhost:8000/api/training/status
```
```json
{
"training_in_progress": false,
"is_trained": true,
"last_training": {
"symbol": "BTCUSDT",
"timestamp": "2025-12-05T18:04:49.757994",
"samples": 500,
"metrics": {
"high_mae": 0.00099,
"high_rmse": 0.00141,
"low_mae": 0.00173,
"low_rmse": 0.00284,
"train_samples": 355,
"test_samples": 89
}
}
}
```
---
## Market Data
### Fuentes de Datos
| Fuente | Uso | API |
|--------|-----|-----|
| Binance | Crypto (BTC, ETH) | REST + WebSocket |
| Mock Data | Testing | Generado localmente |
### OHLCV Structure
```python
@dataclass
class OHLCV:
timestamp: np.ndarray # Epoch milliseconds
open: np.ndarray # Precio apertura
high: np.ndarray # Precio maximo
low: np.ndarray # Precio minimo
close: np.ndarray # Precio cierre
volume: np.ndarray # Volumen
```
---
## Archivos
```
apps/ml-services/
├── src/
│ ├── api/
│ │ ├── server.py # FastAPI app
│ │ └── schemas/
│ │ └── prediction.py # Pydantic schemas
│ ├── data/
│ │ └── market_data.py # MarketDataFetcher
│ └── models/
│ ├── xgboost_model.py # XGBoostPredictor
│ ├── predictor.py # MaxMinPricePredictor
│ └── indicators.py # Indicadores tecnicos
├── trained_models/ # Modelos guardados
│ ├── xgb_high.json
│ └── xgb_low.json
└── environment-cpu.yml # Conda environment
```
---
## Dependencias
```yaml
# Principales
- python=3.11
- fastapi=0.115
- uvicorn
- xgboost
- scikit-learn
- pandas
- numpy
- loguru
- aiohttp
- requests
```
---
## Configuracion
### Variables de Entorno
No requiere variables de entorno obligatorias.
Opcionales:
```env
ML_MODEL_PATH=./trained_models
BINANCE_API_KEY=xxx # Opcional para rate limits
```
### Iniciar Servidor
```bash
cd apps/ml-services
conda activate trading-ml
uvicorn src.api.server:app --host 0.0.0.0 --port 8000 --reload
```
---
## Limitaciones
1. **Simbolos soportados:** Solo BTCUSDT y ETHUSDT para training
2. **Horizonte maximo:** 6 horas (72 candles de 5min)
3. **Rate limits Binance:** 1200 requests/min
4. **Precision:** MAE tipico de 0.1% a 2%
---
## Proximas Mejoras
- [ ] Modelo GRU para patrones secuenciales
- [ ] Ensemble XGBoost + GRU
- [ ] Soporte para mas simbolos (XAU, EUR)
- [ ] Predicciones a nivel de ticks
- [ ] AutoML para optimizacion de hiperparametros
---
## Schemas DDL Asignados
Este modulo es owner del siguiente schema DDL:
| Schema | Tablas | Descripcion |
|--------|--------|-------------|
| **ml** | 12 | models, model_versions, predictions, signals, signal_subscriptions, backtests, backtest_results, feature_sets, training_jobs, ensemble_models, ensemble_predictions, model_metrics |
**Total tablas:** 12
**Nota DDL drift:** Documentacion previa no incluia seccion de schemas DDL. Las 12 tablas cubren el ciclo completo de ML: entrenamiento (models, model_versions, training_jobs, feature_sets), prediccion (predictions, signals, signal_subscriptions), evaluacion (backtests, backtest_results, model_metrics) y ensemble (ensemble_models, ensemble_predictions). Actualizado por TASK-2026-02-06 F2.6.
---
*Documentacion generada: 2025-12-05*