- Created TASK-2026-01-26-ANALYSIS-INTEGRATION-PLAN with complete CAPVED documentation - Orchestrated 5 specialized Explore agents in parallel (85% time reduction) - Identified 7 coherence gaps (DDL↔Backend↔Frontend) - Identified 4 P0 blockers preventing GO-LIVE - Documented 58 missing documentation items - Created detailed roadmap Q1-Q4 2026 (2,500h total) - Added 6 new ET specs for ML strategies (PVA, MRD, VBP, MSA, MTS, Backtesting) - Updated _INDEX.yml with new analysis task Hallazgos críticos: - E-COH-001 to E-COH-007: Coherence gaps (6.5h to fix) - BLOCKER-001 to 004: Token refresh, PCI-DSS, Video upload, MT4 Gateway (380h) - Documentation gaps: 8 ET specs, 8 US, 34 Swagger docs (47.5h) Roadmap phases: - Q1: Security & Blockers (249h) - Q2: Core Features + GO-LIVE (542h) - Q3: Scalability & Performance (380h) - Q4: Innovation & Advanced Features (1,514h) ROI: $223k investment → $750k revenue → $468k net profit (165% ROI) Next: Execute ST1 (Coherencia Fixes P0) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
401 lines
14 KiB
Markdown
401 lines
14 KiB
Markdown
---
|
|
id: "ET-ML-010"
|
|
title: "PVA (Price Variation Attention) Strategy"
|
|
type: "Technical Specification"
|
|
status: "Approved"
|
|
priority: "Alta"
|
|
epic: "OQI-006"
|
|
project: "trading-platform"
|
|
version: "1.0.0"
|
|
created_date: "2026-01-25"
|
|
updated_date: "2026-01-25"
|
|
task_reference: "TASK-2026-01-25-ML-TRAINING-ENHANCEMENT"
|
|
---
|
|
|
|
# ET-ML-010: PVA (Price Variation Attention) Strategy
|
|
|
|
## Metadata
|
|
|
|
| Campo | Valor |
|
|
|-------|-------|
|
|
| **ID** | ET-ML-010 |
|
|
| **Epica** | OQI-006 - Senales ML |
|
|
| **Tipo** | Especificacion Tecnica |
|
|
| **Version** | 1.0.0 |
|
|
| **Estado** | Aprobado |
|
|
| **Ultima actualizacion** | 2026-01-25 |
|
|
| **Tarea Referencia** | TASK-2026-01-25-ML-TRAINING-ENHANCEMENT |
|
|
|
|
---
|
|
|
|
## Resumen
|
|
|
|
La estrategia PVA (Price Variation Attention) es un modelo hibrido que combina un **Transformer Encoder** para aprendizaje de representaciones y **XGBoost** para predicciones finales. El modelo predice la **direccion** y **magnitud** de variaciones de precio en un horizonte futuro.
|
|
|
|
### Caracteristicas Clave
|
|
|
|
- **Diseno Time-Agnostic**: No usa features temporales (hora, dia) para evitar sobreajuste
|
|
- **6 Modelos Independientes**: Un modelo por simbolo (XAUUSD, EURUSD, GBPUSD, USDJPY, BTCUSD, ETHUSD)
|
|
- **Arquitectura Hibrida**: Transformer Encoder + XGBoost Head
|
|
- **Prediction Targets**: Direccion (bullish/bearish) y magnitud del movimiento
|
|
|
|
---
|
|
|
|
## Arquitectura
|
|
|
|
### Diagrama de Alto Nivel
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────┐
|
|
│ PVA MODEL │
|
|
├─────────────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ Input: OHLCV Sequence (seq_len x n_features) │
|
|
│ └── Returns, Acceleration, Volatility features │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ ┌───────────────────────────────────────────────────────────────────┐ │
|
|
│ │ TRANSFORMER ENCODER │ │
|
|
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
|
|
│ │ │ Input Linear │──▶│ Positional │──▶│ Encoder │ │ │
|
|
│ │ │ Projection │ │ Encoding │ │ Layers (4) │ │ │
|
|
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
|
|
│ │ │ │ │
|
|
│ │ ▼ │ │
|
|
│ │ ┌──────────────────┐ │ │
|
|
│ │ │ Sequence Pooling │ │ │
|
|
│ │ │ (Mean + Last) │ │ │
|
|
│ │ └──────────────────┘ │ │
|
|
│ └───────────────────────────────────────────────────────────────────┘ │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ ┌───────────────────────────────────────────────────────────────────┐ │
|
|
│ │ XGBOOST HEAD │ │
|
|
│ │ ┌──────────────────────┐ ┌──────────────────────┐ │ │
|
|
│ │ │ Direction Classifier │ │ Magnitude Regressor │ │ │
|
|
│ │ │ (binary: up/down) │ │ (absolute magnitude) │ │ │
|
|
│ │ └──────────────────────┘ └──────────────────────┘ │ │
|
|
│ └───────────────────────────────────────────────────────────────────┘ │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ Output: PVAPrediction │
|
|
│ - direction: float (-1 to 1) │
|
|
│ - magnitude: float (absolute expected move) │
|
|
│ - confidence: float (0 to 1) │
|
|
└─────────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Componentes del Transformer Encoder
|
|
|
|
| Componente | Configuracion |
|
|
|------------|---------------|
|
|
| **Layers** | 4 encoder layers |
|
|
| **d_model** | 256 |
|
|
| **n_heads** | 8 attention heads |
|
|
| **d_ff** | 1024 (feed-forward dimension) |
|
|
| **dropout** | 0.1 |
|
|
| **positional_encoding** | Sinusoidal |
|
|
| **sequence_length** | 100 candles |
|
|
|
|
### Configuracion XGBoost
|
|
|
|
| Parametro | Valor |
|
|
|-----------|-------|
|
|
| **n_estimators** | 200 |
|
|
| **max_depth** | 6 |
|
|
| **learning_rate** | 0.05 |
|
|
| **subsample** | 0.8 |
|
|
| **colsample_bytree** | 0.8 |
|
|
| **reg_alpha** | 0.1 |
|
|
| **reg_lambda** | 1.0 |
|
|
|
|
---
|
|
|
|
## Feature Engineering
|
|
|
|
### Diseno Time-Agnostic
|
|
|
|
El modelo **no utiliza features temporales** (hora del dia, dia de la semana) para:
|
|
- Evitar sobreajuste a patrones temporales especificos
|
|
- Mejorar generalizacion a diferentes condiciones de mercado
|
|
- Reducir el riesgo de concept drift
|
|
|
|
### Features Implementados
|
|
|
|
```python
|
|
PVAFeatureConfig:
|
|
return_periods: [1, 5, 10, 20]
|
|
volatility_window: 20
|
|
stats_window: 50
|
|
sequence_length: 100
|
|
```
|
|
|
|
#### 1. Return Features
|
|
|
|
| Feature | Descripcion | Formula |
|
|
|---------|-------------|---------|
|
|
| `return_1` | Return 1 periodo | `(close / close.shift(1)) - 1` |
|
|
| `return_5` | Return 5 periodos | `(close / close.shift(5)) - 1` |
|
|
| `return_10` | Return 10 periodos | `(close / close.shift(10)) - 1` |
|
|
| `return_20` | Return 20 periodos | `(close / close.shift(20)) - 1` |
|
|
|
|
#### 2. Acceleration Features
|
|
|
|
| Feature | Descripcion | Formula |
|
|
|---------|-------------|---------|
|
|
| `acceleration_1` | Cambio en momentum corto | `return_1 - return_1.shift(1)` |
|
|
| `acceleration_5` | Cambio en momentum medio | `return_5 - return_5.shift(5)` |
|
|
| `acceleration_20` | Cambio en momentum largo | `return_20 - return_20.shift(20)` |
|
|
|
|
#### 3. Volatility Features
|
|
|
|
| Feature | Descripcion | Formula |
|
|
|---------|-------------|---------|
|
|
| `volatility_returns` | Volatilidad de returns | `return_1.rolling(20).std()` |
|
|
| `volatility_ratio` | Ratio volatilidad actual/promedio | `volatility / volatility.rolling(50).mean()` |
|
|
| `range_volatility` | Volatilidad de rangos | `((high - low) / close).rolling(20).std()` |
|
|
|
|
#### 4. Statistical Features
|
|
|
|
| Feature | Descripcion | Formula |
|
|
|---------|-------------|---------|
|
|
| `return_skew` | Sesgo de returns | `return_1.rolling(50).skew()` |
|
|
| `return_kurt` | Curtosis de returns | `return_1.rolling(50).kurt()` |
|
|
| `zscore_return` | Z-score del return | `(return_1 - mean) / std` |
|
|
|
|
---
|
|
|
|
## Pipeline de Entrenamiento
|
|
|
|
### Flujo de Entrenamiento
|
|
|
|
```
|
|
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
|
│ Data │───▶│ Feature │───▶│ Encoder │───▶│ XGBoost │
|
|
│ Loading │ │ Engineering │ │ Training │ │ Training │
|
|
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ Validation & │
|
|
│ Model Saving │
|
|
└─────────────────┘
|
|
```
|
|
|
|
### Configuracion del Trainer
|
|
|
|
```python
|
|
PVATrainerConfig:
|
|
# Data
|
|
timeframe: '5m'
|
|
batch_size: 64
|
|
sequence_length: 100
|
|
target_horizon: 12 # candles ahead
|
|
|
|
# Training
|
|
encoder_epochs: 50
|
|
encoder_learning_rate: 1e-4
|
|
early_stopping_patience: 10
|
|
|
|
# Validation
|
|
val_ratio: 0.15
|
|
walk_forward_splits: 5
|
|
min_train_size: 10000
|
|
```
|
|
|
|
### Walk-Forward Validation
|
|
|
|
El modelo utiliza **walk-forward validation** para evaluar rendimiento:
|
|
|
|
```
|
|
Time ──────────────────────────────────────────────────────────────▶
|
|
|
|
Fold 1: [========= TRAIN =========][TEST]
|
|
Fold 2: [============= TRAIN =============][TEST]
|
|
Fold 3: [================= TRAIN =================][TEST]
|
|
Fold 4: [===================== TRAIN =====================][TEST]
|
|
Fold 5: [========================= TRAIN =========================][TEST]
|
|
```
|
|
|
|
**Caracteristicas:**
|
|
- Expanding window (ventana creciente)
|
|
- 5 folds por defecto
|
|
- Gap opcional entre train y test
|
|
- Metricas agregadas por fold
|
|
|
|
---
|
|
|
|
## Metricas de Evaluacion
|
|
|
|
### Metricas Primarias
|
|
|
|
| Metrica | Descripcion | Target |
|
|
|---------|-------------|--------|
|
|
| **Direction Accuracy** | Precision en direccion (up/down) | >= 55% |
|
|
| **Magnitude MAE** | Error absoluto medio en magnitud | Minimo |
|
|
| **Directional Return** | Return promedio considerando direccion | > 0 |
|
|
| **Sharpe Proxy** | `mean(signed_returns) / std(signed_returns)` | > 1.0 |
|
|
|
|
### Metricas Secundarias
|
|
|
|
| Metrica | Descripcion |
|
|
|---------|-------------|
|
|
| **Encoder Loss** | MSE del autoencoder |
|
|
| **Confidence Calibration** | Alineacion confianza vs accuracy |
|
|
| **Per-Symbol Performance** | Metricas desglosadas por simbolo |
|
|
|
|
---
|
|
|
|
## API y Uso
|
|
|
|
### Clase Principal: PVAModel
|
|
|
|
```python
|
|
from models.strategies.pva import PVAModel, PVAConfig
|
|
|
|
# Configuracion
|
|
config = PVAConfig(
|
|
input_features=15,
|
|
sequence_length=100,
|
|
d_model=256,
|
|
n_heads=8,
|
|
n_layers=4,
|
|
d_ff=1024,
|
|
dropout=0.1,
|
|
device='cuda'
|
|
)
|
|
|
|
# Inicializar modelo
|
|
model = PVAModel(config)
|
|
|
|
# Entrenar encoder
|
|
history = model.fit_encoder(
|
|
X_train, y_train,
|
|
X_val, y_val,
|
|
epochs=50,
|
|
batch_size=64
|
|
)
|
|
|
|
# Entrenar XGBoost
|
|
metrics = model.fit_xgboost(X_train, y_train, X_val, y_val)
|
|
|
|
# Prediccion
|
|
predictions = model.predict(X_new)
|
|
for pred in predictions:
|
|
print(f"Direction: {pred.direction}, Magnitude: {pred.magnitude}")
|
|
```
|
|
|
|
### Clase PVAPrediction
|
|
|
|
```python
|
|
@dataclass
|
|
class PVAPrediction:
|
|
direction: float # -1 to 1 (bearish to bullish)
|
|
magnitude: float # Expected absolute move
|
|
confidence: float # 0 to 1
|
|
encoder_features: np.ndarray # Latent representation
|
|
|
|
@property
|
|
def expected_return(self) -> float:
|
|
return self.direction * self.magnitude
|
|
|
|
@property
|
|
def signal_strength(self) -> float:
|
|
return abs(self.direction) * self.confidence
|
|
```
|
|
|
|
### Clase PVATrainer
|
|
|
|
```python
|
|
from models.strategies.pva import PVATrainer, PVATrainerConfig
|
|
|
|
# Configurar trainer
|
|
config = PVATrainerConfig(
|
|
timeframe='5m',
|
|
sequence_length=100,
|
|
target_horizon=12,
|
|
encoder_epochs=50
|
|
)
|
|
|
|
trainer = PVATrainer(config)
|
|
|
|
# Entrenar para un simbolo
|
|
model, metrics = trainer.train(
|
|
symbol='XAUUSD',
|
|
start_date='2023-01-01',
|
|
end_date='2024-12-31'
|
|
)
|
|
|
|
# Walk-forward validation
|
|
results = trainer.walk_forward_train('XAUUSD', n_folds=5)
|
|
print(f"Avg Direction Accuracy: {results.avg_direction_accuracy:.2%}")
|
|
|
|
# Guardar modelo
|
|
trainer.save_model(model, 'XAUUSD', 'v1.0.0')
|
|
```
|
|
|
|
---
|
|
|
|
## Estructura de Archivos
|
|
|
|
```
|
|
apps/ml-engine/src/models/strategies/pva/
|
|
├── __init__.py
|
|
├── model.py # PVAModel, PVAConfig, PVAPrediction
|
|
├── feature_engineering.py # PVAFeatureEngineer, PVAFeatureConfig
|
|
├── trainer.py # PVATrainer, TrainingMetrics
|
|
└── attention.py # PriceVariationAttention encoder
|
|
```
|
|
|
|
---
|
|
|
|
## Consideraciones de Produccion
|
|
|
|
### GPU Acceleration
|
|
|
|
```python
|
|
# Deteccion automatica de GPU
|
|
device = 'cuda' if torch.cuda.is_available() else 'cpu'
|
|
model = PVAModel(config, device=device)
|
|
|
|
# XGBoost con GPU
|
|
xgb_params = {
|
|
'tree_method': 'gpu_hist',
|
|
'device': 'cuda'
|
|
}
|
|
```
|
|
|
|
### Model Versioning
|
|
|
|
```
|
|
models/pva/{symbol}/{version}/
|
|
├── encoder.pt # PyTorch encoder weights
|
|
├── xgb_direction.joblib # XGBoost direction classifier
|
|
├── xgb_magnitude.joblib # XGBoost magnitude regressor
|
|
├── config.json # Model configuration
|
|
├── metadata.json # Training metadata
|
|
└── feature_names.json # Feature column names
|
|
```
|
|
|
|
### Inference Batch Size
|
|
|
|
| Escenario | Batch Size Recomendado |
|
|
|-----------|------------------------|
|
|
| Real-time single | 1 |
|
|
| Backtesting | 256 |
|
|
| Bulk inference | 1024 |
|
|
|
|
---
|
|
|
|
## Referencias
|
|
|
|
- [ET-ML-001: Arquitectura ML Engine](./ET-ML-001-arquitectura.md)
|
|
- [ET-ML-003: Feature Engineering](./ET-ML-003-features.md)
|
|
- [ET-ML-015: Backtesting Framework](./ET-ML-015-backtesting-framework.md)
|
|
- [Attention Is All You Need (Vaswani et al.)](https://arxiv.org/abs/1706.03762)
|
|
|
|
---
|
|
|
|
**Autor:** ML-Specialist (NEXUS v4.0)
|
|
**Fecha:** 2026-01-25
|