ML Engine Updates: - Updated BTCUSD with Polygon API data (2024-2025): 215,699 new records - Re-trained all ML models: Attention (R²: 0.223), Base, Metamodel (87.3% confidence) - Backtest results: +176.71R profit with aggressive_filter strategy Documentation Consolidation: - Created docs/99-analisis/_MAP.md index with 13 new analysis documents - Consolidated inventories: removed duplicates from orchestration/inventarios/ - Updated ML_INVENTORY.yml with BTCUSD metrics and training results - Added execution reports: FASE11-BTCUSD, correction issues, alignment validation Architecture & Integration: - Updated all module documentation with NEXUS v3.4 frontmatter - Fixed _MAP.md indexes across all folders - Updated orchestration plans and traces Files: 229 changed, 5064 insertions(+), 1872 deletions(-) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
348 lines
9.1 KiB
Markdown
348 lines
9.1 KiB
Markdown
---
|
||
id: "ADR-006-caching"
|
||
title: "Estrategia de Caching"
|
||
type: "Documentation"
|
||
project: "trading-platform"
|
||
version: "1.0.0"
|
||
updated_date: "2026-01-04"
|
||
---
|
||
|
||
# ADR-005: Estrategia de Caching
|
||
|
||
**Estado:** Aceptado
|
||
**Fecha:** 2025-12-06
|
||
**Decisores:** Tech Lead, Arquitecto
|
||
**Relacionado:** ADR-001
|
||
|
||
---
|
||
|
||
## Contexto
|
||
|
||
Trading Platform necesita optimizar performance en varios aspectos:
|
||
|
||
1. **Predicciones ML**: Modelos tardan 200-500ms, usuarios esperan < 100ms
|
||
2. **Market Data**: APIs de trading limitan a 100 requests/min
|
||
3. **Sesiones de Usuario**: JWT validation en cada request es costosa
|
||
4. **Rate Limiting**: Prevenir abuse de API (DDoS, brute force)
|
||
5. **Leaderboards**: Queries complejas que agregan datos de muchos usuarios
|
||
6. **Historical Data**: Datos históricos raramente cambian
|
||
|
||
Requisitos de Performance:
|
||
- API responses < 200ms (p95)
|
||
- ML predictions < 100ms (cached)
|
||
- Soporte para 10K usuarios concurrentes (post-MVP)
|
||
|
||
---
|
||
|
||
## Decisión
|
||
|
||
### Cache Layer: Redis 7.x
|
||
|
||
**Redis** como cache principal por:
|
||
- In-memory: latencia < 1ms
|
||
- Pub/Sub: Para invalidación de cache distribuida
|
||
- Data structures: Lists, Sets, Sorted Sets para leaderboards
|
||
- TTL nativo: Expiración automática
|
||
- Persistence opcional: RDB snapshots para disaster recovery
|
||
|
||
### Cache Strategy por Tipo de Dato
|
||
|
||
| Data Type | Strategy | TTL | Invalidation |
|
||
|-----------|----------|-----|--------------|
|
||
| **Sessions** | Write-through | 7 days | Logout, password change |
|
||
| **ML Predictions** | Cache-aside | 5 min | Model retrain |
|
||
| **Market Data** | Cache-aside | 1 min | Webhook from broker API |
|
||
| **User Profile** | Write-through | 1 hour | Profile update |
|
||
| **Leaderboards** | Cache-aside | 10 min | Cron job rebuild |
|
||
| **Historical OHLCV** | Cache-aside | 24 hours | Never (immutable) |
|
||
| **Rate Limit** | Counter | 1 min | Auto-expire |
|
||
|
||
### TTL Configuration
|
||
|
||
```typescript
|
||
// apps/backend/src/config/cache.ts
|
||
export const CACHE_TTL = {
|
||
SESSION: 60 * 60 * 24 * 7, // 7 days
|
||
ML_PREDICTION: 60 * 5, // 5 minutes
|
||
MARKET_DATA: 60, // 1 minute
|
||
USER_PROFILE: 60 * 60, // 1 hour
|
||
LEADERBOARD: 60 * 10, // 10 minutes
|
||
OHLCV_HISTORICAL: 60 * 60 * 24, // 24 hours
|
||
RATE_LIMIT: 60, // 1 minute
|
||
} as const;
|
||
```
|
||
|
||
### Redis Key Naming Convention
|
||
|
||
```
|
||
{app}:{entity}:{id}:{version}
|
||
|
||
Examples:
|
||
- session:user:123abc:v1
|
||
- ml:prediction:AAPL:1d:v2
|
||
- market:ohlcv:TSLA:2025-12-06:v1
|
||
- user:profile:456def:v1
|
||
- leaderboard:monthly:2025-12:v1
|
||
- ratelimit:api:192.168.1.1:v1
|
||
```
|
||
|
||
### Cache-Aside Pattern (Read-Heavy)
|
||
|
||
```typescript
|
||
// apps/backend/src/services/market.service.ts
|
||
async function getOHLCV(symbol: string, date: string) {
|
||
const cacheKey = `market:ohlcv:${symbol}:${date}:v1`;
|
||
|
||
// 1. Try cache first
|
||
const cached = await redis.get(cacheKey);
|
||
if (cached) {
|
||
return JSON.parse(cached);
|
||
}
|
||
|
||
// 2. Cache miss → fetch from DB
|
||
const data = await db.ohlcv.findUnique({
|
||
where: { symbol, date }
|
||
});
|
||
|
||
// 3. Store in cache
|
||
await redis.setex(
|
||
cacheKey,
|
||
CACHE_TTL.OHLCV_HISTORICAL,
|
||
JSON.stringify(data)
|
||
);
|
||
|
||
return data;
|
||
}
|
||
```
|
||
|
||
### Write-Through Pattern (Write-Heavy)
|
||
|
||
```typescript
|
||
// apps/backend/src/services/user.service.ts
|
||
async function updateProfile(userId: string, data: ProfileUpdate) {
|
||
const cacheKey = `user:profile:${userId}:v1`;
|
||
|
||
// 1. Write to DB first
|
||
const updated = await db.user.update({
|
||
where: { id: userId },
|
||
data
|
||
});
|
||
|
||
// 2. Update cache
|
||
await redis.setex(
|
||
cacheKey,
|
||
CACHE_TTL.USER_PROFILE,
|
||
JSON.stringify(updated)
|
||
);
|
||
|
||
return updated;
|
||
}
|
||
```
|
||
|
||
### Rate Limiting
|
||
|
||
```typescript
|
||
// apps/backend/src/middleware/rateLimit.ts
|
||
async function rateLimit(req: Request, res: Response, next: NextFunction) {
|
||
const ip = req.ip;
|
||
const key = `ratelimit:api:${ip}:v1`;
|
||
|
||
const current = await redis.incr(key);
|
||
|
||
if (current === 1) {
|
||
await redis.expire(key, CACHE_TTL.RATE_LIMIT);
|
||
}
|
||
|
||
if (current > 100) { // 100 requests per minute
|
||
return res.status(429).json({
|
||
error: 'Too many requests, try again later'
|
||
});
|
||
}
|
||
|
||
next();
|
||
}
|
||
```
|
||
|
||
### ML Predictions Cache
|
||
|
||
```python
|
||
# apps/ml-engine/cache.py
|
||
import redis
|
||
import json
|
||
from datetime import timedelta
|
||
|
||
redis_client = redis.Redis(host='localhost', port=6379)
|
||
|
||
async def get_prediction(symbol: str, timeframe: str):
|
||
cache_key = f"ml:prediction:{symbol}:{timeframe}:v2"
|
||
|
||
# Try cache
|
||
cached = redis_client.get(cache_key)
|
||
if cached:
|
||
return json.loads(cached)
|
||
|
||
# Generate prediction (expensive)
|
||
prediction = await model.predict(symbol, timeframe)
|
||
|
||
# Cache for 5 minutes
|
||
redis_client.setex(
|
||
cache_key,
|
||
timedelta(minutes=5),
|
||
json.dumps(prediction)
|
||
)
|
||
|
||
return prediction
|
||
```
|
||
|
||
### Cache Invalidation
|
||
|
||
```typescript
|
||
// apps/backend/src/services/cache.service.ts
|
||
class CacheInvalidator {
|
||
// Invalidate single key
|
||
async invalidate(key: string) {
|
||
await redis.del(key);
|
||
}
|
||
|
||
// Invalidate pattern (use with caution)
|
||
async invalidatePattern(pattern: string) {
|
||
const keys = await redis.keys(pattern);
|
||
if (keys.length > 0) {
|
||
await redis.del(...keys);
|
||
}
|
||
}
|
||
|
||
// Invalidate on model retrain
|
||
async invalidateMLPredictions() {
|
||
await this.invalidatePattern('ml:prediction:*');
|
||
}
|
||
|
||
// Invalidate user session
|
||
async invalidateSession(userId: string) {
|
||
await this.invalidatePattern(`session:user:${userId}:*`);
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Consecuencias
|
||
|
||
### Positivas
|
||
|
||
1. **Performance**: API responses < 100ms (cached), vs 500ms (uncached)
|
||
2. **Cost Savings**: 90% menos requests a APIs externas de market data
|
||
3. **Scalability**: Redis soporta 100K ops/sec en hardware modesto
|
||
4. **Rate Limiting**: Previene abuse sin afectar usuarios legítimos
|
||
5. **ML Efficiency**: Predicciones cached reducen load en GPU
|
||
6. **Developer Experience**: Redis CLI para debugging de cache
|
||
7. **Session Management**: Logout instantáneo vía invalidación
|
||
|
||
### Negativas
|
||
|
||
1. **Stale Data**: Cache puede servir datos desactualizados por max TTL
|
||
2. **Memory Cost**: Redis requiere ~2GB RAM para 10K usuarios activos
|
||
3. **Complexity**: Cache invalidation es "one of the two hard problems"
|
||
4. **Cold Start**: Primeros requests son lentos (cache warming needed)
|
||
5. **Thundering Herd**: Múltiples requests simultáneos al expirar cache
|
||
6. **Redis SPOF**: Si Redis cae, performance degrada (no falla, pero lento)
|
||
|
||
### Riesgos y Mitigaciones
|
||
|
||
| Riesgo | Mitigación |
|
||
|--------|-----------|
|
||
| Cache invalidation bugs | Unit tests + integration tests de cache |
|
||
| Thundering herd | Implement cache locking (SETNX pattern) |
|
||
| Redis down | Graceful degradation: app funciona sin cache |
|
||
| Memory exhausted | Eviction policy: `allkeys-lru` |
|
||
| Stale ML predictions | Short TTL (5 min) + manual invalidation |
|
||
|
||
---
|
||
|
||
## Alternativas Consideradas
|
||
|
||
### 1. Memcached
|
||
- **Pros**: Simple, muy rápido, menos memoria que Redis
|
||
- **Contras**: No persistence, no data structures, no pub/sub
|
||
- **Decisión**: ❌ Descartada - Redis ofrece más features por mismo costo
|
||
|
||
### 2. Application-Level Cache (Node.js Map)
|
||
- **Pros**: Cero latencia, no external dependency
|
||
- **Contras**: No distribuido, se pierde al reiniciar app
|
||
- **Decisión**: ❌ Descartada - No escala a múltiples instancias
|
||
|
||
### 3. CDN Caching (Cloudflare/CloudFront)
|
||
- **Pros**: Global distribution, DDoS protection
|
||
- **Contras**: Solo para static assets, no funciona para API
|
||
- **Decisión**: ⚠️ Complementario - Usar para frontend, no para API
|
||
|
||
### 4. Database Query Cache (PostgreSQL)
|
||
- **Pros**: Ya incluido en Postgres
|
||
- **Contras**: Limitado, no configurable por query
|
||
- **Decisión**: ❌ Insuficiente - Necesitamos cache más agresivo
|
||
|
||
### 5. GraphQL + DataLoader
|
||
- **Pros**: Batching automático, cache per-request
|
||
- **Contras**: Requiere migrar de REST a GraphQL
|
||
- **Decisión**: ❌ Descartada - REST es suficiente para MVP
|
||
|
||
### 6. No Cache (Optimizar DB Queries)
|
||
- **Pros**: Menos complejidad, no stale data
|
||
- **Contras**: Imposible alcanzar < 200ms para ML predictions
|
||
- **Decisión**: ❌ Descartada - Performance requirements no alcanzables
|
||
|
||
---
|
||
|
||
## Cache Warming Strategy
|
||
|
||
```typescript
|
||
// apps/backend/src/jobs/cache-warmer.ts
|
||
import cron from 'node-cron';
|
||
|
||
// Run every 5 minutes
|
||
cron.schedule('*/5 * * * *', async () => {
|
||
const topSymbols = ['AAPL', 'TSLA', 'GOOGL', 'MSFT', 'AMZN'];
|
||
|
||
for (const symbol of topSymbols) {
|
||
// Pre-cache ML predictions for popular symbols
|
||
await mlService.getPrediction(symbol, '1d');
|
||
await mlService.getPrediction(symbol, '1w');
|
||
|
||
// Pre-cache market data
|
||
await marketService.getOHLCV(symbol, today());
|
||
}
|
||
|
||
// Pre-cache leaderboards
|
||
await leaderboardService.getMonthly();
|
||
});
|
||
```
|
||
|
||
---
|
||
|
||
## Monitoring
|
||
|
||
### Redis Metrics to Track
|
||
```
|
||
- Hit/Miss Ratio (target > 80%)
|
||
- Evicted Keys (should be 0)
|
||
- Memory Usage (alert at > 80%)
|
||
- Connected Clients
|
||
- Commands/sec
|
||
```
|
||
|
||
### Logging
|
||
```typescript
|
||
// Log cache hits/misses
|
||
logger.info('cache_hit', { key, ttl: remaining });
|
||
logger.warn('cache_miss', { key, reason: 'expired' });
|
||
```
|
||
|
||
---
|
||
|
||
## Referencias
|
||
|
||
- [Redis Documentation](https://redis.io/docs/)
|
||
- [Cache Strategies](https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/Strategies.html)
|
||
- [Cache Invalidation Patterns](https://martinfowler.com/bliki/TwoHardThings.html)
|
||
- [Redis Best Practices](https://redis.io/docs/manual/patterns/)
|