trading-platform/docs/02-definicion-modulos/OQI-006-ml-signals/especificaciones/ET-ML-010-pva-strategy.md
Adrian Flores Cortes f1174723ed feat: Add comprehensive analysis and integration plan for trading-platform
- Created TASK-2026-01-26-ANALYSIS-INTEGRATION-PLAN with complete CAPVED documentation
- Orchestrated 5 specialized Explore agents in parallel (85% time reduction)
- Identified 7 coherence gaps (DDL↔Backend↔Frontend)
- Identified 4 P0 blockers preventing GO-LIVE
- Documented 58 missing documentation items
- Created detailed roadmap Q1-Q4 2026 (2,500h total)
- Added 6 new ET specs for ML strategies (PVA, MRD, VBP, MSA, MTS, Backtesting)
- Updated _INDEX.yml with new analysis task

Hallazgos críticos:
- E-COH-001 to E-COH-007: Coherence gaps (6.5h to fix)
- BLOCKER-001 to 004: Token refresh, PCI-DSS, Video upload, MT4 Gateway (380h)
- Documentation gaps: 8 ET specs, 8 US, 34 Swagger docs (47.5h)

Roadmap phases:
- Q1: Security & Blockers (249h)
- Q2: Core Features + GO-LIVE (542h)
- Q3: Scalability & Performance (380h)
- Q4: Innovation & Advanced Features (1,514h)

ROI: $223k investment → $750k revenue → $468k net profit (165% ROI)

Next: Execute ST1 (Coherencia Fixes P0)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 16:40:56 -06:00

401 lines
14 KiB
Markdown

---
id: "ET-ML-010"
title: "PVA (Price Variation Attention) Strategy"
type: "Technical Specification"
status: "Approved"
priority: "Alta"
epic: "OQI-006"
project: "trading-platform"
version: "1.0.0"
created_date: "2026-01-25"
updated_date: "2026-01-25"
task_reference: "TASK-2026-01-25-ML-TRAINING-ENHANCEMENT"
---
# ET-ML-010: PVA (Price Variation Attention) Strategy
## Metadata
| Campo | Valor |
|-------|-------|
| **ID** | ET-ML-010 |
| **Epica** | OQI-006 - Senales ML |
| **Tipo** | Especificacion Tecnica |
| **Version** | 1.0.0 |
| **Estado** | Aprobado |
| **Ultima actualizacion** | 2026-01-25 |
| **Tarea Referencia** | TASK-2026-01-25-ML-TRAINING-ENHANCEMENT |
---
## Resumen
La estrategia PVA (Price Variation Attention) es un modelo hibrido que combina un **Transformer Encoder** para aprendizaje de representaciones y **XGBoost** para predicciones finales. El modelo predice la **direccion** y **magnitud** de variaciones de precio en un horizonte futuro.
### Caracteristicas Clave
- **Diseno Time-Agnostic**: No usa features temporales (hora, dia) para evitar sobreajuste
- **6 Modelos Independientes**: Un modelo por simbolo (XAUUSD, EURUSD, GBPUSD, USDJPY, BTCUSD, ETHUSD)
- **Arquitectura Hibrida**: Transformer Encoder + XGBoost Head
- **Prediction Targets**: Direccion (bullish/bearish) y magnitud del movimiento
---
## Arquitectura
### Diagrama de Alto Nivel
```
┌─────────────────────────────────────────────────────────────────────────┐
│ PVA MODEL │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Input: OHLCV Sequence (seq_len x n_features) │
│ └── Returns, Acceleration, Volatility features │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ TRANSFORMER ENCODER │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Input Linear │──▶│ Positional │──▶│ Encoder │ │ │
│ │ │ Projection │ │ Encoding │ │ Layers (4) │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────────┐ │ │
│ │ │ Sequence Pooling │ │ │
│ │ │ (Mean + Last) │ │ │
│ │ └──────────────────┘ │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ XGBOOST HEAD │ │
│ │ ┌──────────────────────┐ ┌──────────────────────┐ │ │
│ │ │ Direction Classifier │ │ Magnitude Regressor │ │ │
│ │ │ (binary: up/down) │ │ (absolute magnitude) │ │ │
│ │ └──────────────────────┘ └──────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Output: PVAPrediction │
│ - direction: float (-1 to 1) │
│ - magnitude: float (absolute expected move) │
│ - confidence: float (0 to 1) │
└─────────────────────────────────────────────────────────────────────────┘
```
### Componentes del Transformer Encoder
| Componente | Configuracion |
|------------|---------------|
| **Layers** | 4 encoder layers |
| **d_model** | 256 |
| **n_heads** | 8 attention heads |
| **d_ff** | 1024 (feed-forward dimension) |
| **dropout** | 0.1 |
| **positional_encoding** | Sinusoidal |
| **sequence_length** | 100 candles |
### Configuracion XGBoost
| Parametro | Valor |
|-----------|-------|
| **n_estimators** | 200 |
| **max_depth** | 6 |
| **learning_rate** | 0.05 |
| **subsample** | 0.8 |
| **colsample_bytree** | 0.8 |
| **reg_alpha** | 0.1 |
| **reg_lambda** | 1.0 |
---
## Feature Engineering
### Diseno Time-Agnostic
El modelo **no utiliza features temporales** (hora del dia, dia de la semana) para:
- Evitar sobreajuste a patrones temporales especificos
- Mejorar generalizacion a diferentes condiciones de mercado
- Reducir el riesgo de concept drift
### Features Implementados
```python
PVAFeatureConfig:
return_periods: [1, 5, 10, 20]
volatility_window: 20
stats_window: 50
sequence_length: 100
```
#### 1. Return Features
| Feature | Descripcion | Formula |
|---------|-------------|---------|
| `return_1` | Return 1 periodo | `(close / close.shift(1)) - 1` |
| `return_5` | Return 5 periodos | `(close / close.shift(5)) - 1` |
| `return_10` | Return 10 periodos | `(close / close.shift(10)) - 1` |
| `return_20` | Return 20 periodos | `(close / close.shift(20)) - 1` |
#### 2. Acceleration Features
| Feature | Descripcion | Formula |
|---------|-------------|---------|
| `acceleration_1` | Cambio en momentum corto | `return_1 - return_1.shift(1)` |
| `acceleration_5` | Cambio en momentum medio | `return_5 - return_5.shift(5)` |
| `acceleration_20` | Cambio en momentum largo | `return_20 - return_20.shift(20)` |
#### 3. Volatility Features
| Feature | Descripcion | Formula |
|---------|-------------|---------|
| `volatility_returns` | Volatilidad de returns | `return_1.rolling(20).std()` |
| `volatility_ratio` | Ratio volatilidad actual/promedio | `volatility / volatility.rolling(50).mean()` |
| `range_volatility` | Volatilidad de rangos | `((high - low) / close).rolling(20).std()` |
#### 4. Statistical Features
| Feature | Descripcion | Formula |
|---------|-------------|---------|
| `return_skew` | Sesgo de returns | `return_1.rolling(50).skew()` |
| `return_kurt` | Curtosis de returns | `return_1.rolling(50).kurt()` |
| `zscore_return` | Z-score del return | `(return_1 - mean) / std` |
---
## Pipeline de Entrenamiento
### Flujo de Entrenamiento
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Data │───▶│ Feature │───▶│ Encoder │───▶│ XGBoost │
│ Loading │ │ Engineering │ │ Training │ │ Training │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
┌─────────────────┐
│ Validation & │
│ Model Saving │
└─────────────────┘
```
### Configuracion del Trainer
```python
PVATrainerConfig:
# Data
timeframe: '5m'
batch_size: 64
sequence_length: 100
target_horizon: 12 # candles ahead
# Training
encoder_epochs: 50
encoder_learning_rate: 1e-4
early_stopping_patience: 10
# Validation
val_ratio: 0.15
walk_forward_splits: 5
min_train_size: 10000
```
### Walk-Forward Validation
El modelo utiliza **walk-forward validation** para evaluar rendimiento:
```
Time ──────────────────────────────────────────────────────────────▶
Fold 1: [========= TRAIN =========][TEST]
Fold 2: [============= TRAIN =============][TEST]
Fold 3: [================= TRAIN =================][TEST]
Fold 4: [===================== TRAIN =====================][TEST]
Fold 5: [========================= TRAIN =========================][TEST]
```
**Caracteristicas:**
- Expanding window (ventana creciente)
- 5 folds por defecto
- Gap opcional entre train y test
- Metricas agregadas por fold
---
## Metricas de Evaluacion
### Metricas Primarias
| Metrica | Descripcion | Target |
|---------|-------------|--------|
| **Direction Accuracy** | Precision en direccion (up/down) | >= 55% |
| **Magnitude MAE** | Error absoluto medio en magnitud | Minimo |
| **Directional Return** | Return promedio considerando direccion | > 0 |
| **Sharpe Proxy** | `mean(signed_returns) / std(signed_returns)` | > 1.0 |
### Metricas Secundarias
| Metrica | Descripcion |
|---------|-------------|
| **Encoder Loss** | MSE del autoencoder |
| **Confidence Calibration** | Alineacion confianza vs accuracy |
| **Per-Symbol Performance** | Metricas desglosadas por simbolo |
---
## API y Uso
### Clase Principal: PVAModel
```python
from models.strategies.pva import PVAModel, PVAConfig
# Configuracion
config = PVAConfig(
input_features=15,
sequence_length=100,
d_model=256,
n_heads=8,
n_layers=4,
d_ff=1024,
dropout=0.1,
device='cuda'
)
# Inicializar modelo
model = PVAModel(config)
# Entrenar encoder
history = model.fit_encoder(
X_train, y_train,
X_val, y_val,
epochs=50,
batch_size=64
)
# Entrenar XGBoost
metrics = model.fit_xgboost(X_train, y_train, X_val, y_val)
# Prediccion
predictions = model.predict(X_new)
for pred in predictions:
print(f"Direction: {pred.direction}, Magnitude: {pred.magnitude}")
```
### Clase PVAPrediction
```python
@dataclass
class PVAPrediction:
direction: float # -1 to 1 (bearish to bullish)
magnitude: float # Expected absolute move
confidence: float # 0 to 1
encoder_features: np.ndarray # Latent representation
@property
def expected_return(self) -> float:
return self.direction * self.magnitude
@property
def signal_strength(self) -> float:
return abs(self.direction) * self.confidence
```
### Clase PVATrainer
```python
from models.strategies.pva import PVATrainer, PVATrainerConfig
# Configurar trainer
config = PVATrainerConfig(
timeframe='5m',
sequence_length=100,
target_horizon=12,
encoder_epochs=50
)
trainer = PVATrainer(config)
# Entrenar para un simbolo
model, metrics = trainer.train(
symbol='XAUUSD',
start_date='2023-01-01',
end_date='2024-12-31'
)
# Walk-forward validation
results = trainer.walk_forward_train('XAUUSD', n_folds=5)
print(f"Avg Direction Accuracy: {results.avg_direction_accuracy:.2%}")
# Guardar modelo
trainer.save_model(model, 'XAUUSD', 'v1.0.0')
```
---
## Estructura de Archivos
```
apps/ml-engine/src/models/strategies/pva/
├── __init__.py
├── model.py # PVAModel, PVAConfig, PVAPrediction
├── feature_engineering.py # PVAFeatureEngineer, PVAFeatureConfig
├── trainer.py # PVATrainer, TrainingMetrics
└── attention.py # PriceVariationAttention encoder
```
---
## Consideraciones de Produccion
### GPU Acceleration
```python
# Deteccion automatica de GPU
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = PVAModel(config, device=device)
# XGBoost con GPU
xgb_params = {
'tree_method': 'gpu_hist',
'device': 'cuda'
}
```
### Model Versioning
```
models/pva/{symbol}/{version}/
├── encoder.pt # PyTorch encoder weights
├── xgb_direction.joblib # XGBoost direction classifier
├── xgb_magnitude.joblib # XGBoost magnitude regressor
├── config.json # Model configuration
├── metadata.json # Training metadata
└── feature_names.json # Feature column names
```
### Inference Batch Size
| Escenario | Batch Size Recomendado |
|-----------|------------------------|
| Real-time single | 1 |
| Backtesting | 256 |
| Bulk inference | 1024 |
---
## Referencias
- [ET-ML-001: Arquitectura ML Engine](./ET-ML-001-arquitectura.md)
- [ET-ML-003: Feature Engineering](./ET-ML-003-features.md)
- [ET-ML-015: Backtesting Framework](./ET-ML-015-backtesting-framework.md)
- [Attention Is All You Need (Vaswani et al.)](https://arxiv.org/abs/1706.03762)
---
**Autor:** ML-Specialist (NEXUS v4.0)
**Fecha:** 2026-01-25