--- id: "ADR-006-caching" title: "Estrategia de Caching" type: "Documentation" project: "trading-platform" version: "1.0.0" updated_date: "2026-01-04" --- # ADR-005: Estrategia de Caching **Estado:** Aceptado **Fecha:** 2025-12-06 **Decisores:** Tech Lead, Arquitecto **Relacionado:** ADR-001 --- ## Contexto Trading Platform necesita optimizar performance en varios aspectos: 1. **Predicciones ML**: Modelos tardan 200-500ms, usuarios esperan < 100ms 2. **Market Data**: APIs de trading limitan a 100 requests/min 3. **Sesiones de Usuario**: JWT validation en cada request es costosa 4. **Rate Limiting**: Prevenir abuse de API (DDoS, brute force) 5. **Leaderboards**: Queries complejas que agregan datos de muchos usuarios 6. **Historical Data**: Datos históricos raramente cambian Requisitos de Performance: - API responses < 200ms (p95) - ML predictions < 100ms (cached) - Soporte para 10K usuarios concurrentes (post-MVP) --- ## Decisión ### Cache Layer: Redis 7.x **Redis** como cache principal por: - In-memory: latencia < 1ms - Pub/Sub: Para invalidación de cache distribuida - Data structures: Lists, Sets, Sorted Sets para leaderboards - TTL nativo: Expiración automática - Persistence opcional: RDB snapshots para disaster recovery ### Cache Strategy por Tipo de Dato | Data Type | Strategy | TTL | Invalidation | |-----------|----------|-----|--------------| | **Sessions** | Write-through | 7 days | Logout, password change | | **ML Predictions** | Cache-aside | 5 min | Model retrain | | **Market Data** | Cache-aside | 1 min | Webhook from broker API | | **User Profile** | Write-through | 1 hour | Profile update | | **Leaderboards** | Cache-aside | 10 min | Cron job rebuild | | **Historical OHLCV** | Cache-aside | 24 hours | Never (immutable) | | **Rate Limit** | Counter | 1 min | Auto-expire | ### TTL Configuration ```typescript // apps/backend/src/config/cache.ts export const CACHE_TTL = { SESSION: 60 * 60 * 24 * 7, // 7 days ML_PREDICTION: 60 * 5, // 5 minutes MARKET_DATA: 60, // 1 minute USER_PROFILE: 60 * 60, // 1 hour LEADERBOARD: 60 * 10, // 10 minutes OHLCV_HISTORICAL: 60 * 60 * 24, // 24 hours RATE_LIMIT: 60, // 1 minute } as const; ``` ### Redis Key Naming Convention ``` {app}:{entity}:{id}:{version} Examples: - session:user:123abc:v1 - ml:prediction:AAPL:1d:v2 - market:ohlcv:TSLA:2025-12-06:v1 - user:profile:456def:v1 - leaderboard:monthly:2025-12:v1 - ratelimit:api:192.168.1.1:v1 ``` ### Cache-Aside Pattern (Read-Heavy) ```typescript // apps/backend/src/services/market.service.ts async function getOHLCV(symbol: string, date: string) { const cacheKey = `market:ohlcv:${symbol}:${date}:v1`; // 1. Try cache first const cached = await redis.get(cacheKey); if (cached) { return JSON.parse(cached); } // 2. Cache miss → fetch from DB const data = await db.ohlcv.findUnique({ where: { symbol, date } }); // 3. Store in cache await redis.setex( cacheKey, CACHE_TTL.OHLCV_HISTORICAL, JSON.stringify(data) ); return data; } ``` ### Write-Through Pattern (Write-Heavy) ```typescript // apps/backend/src/services/user.service.ts async function updateProfile(userId: string, data: ProfileUpdate) { const cacheKey = `user:profile:${userId}:v1`; // 1. Write to DB first const updated = await db.user.update({ where: { id: userId }, data }); // 2. Update cache await redis.setex( cacheKey, CACHE_TTL.USER_PROFILE, JSON.stringify(updated) ); return updated; } ``` ### Rate Limiting ```typescript // apps/backend/src/middleware/rateLimit.ts async function rateLimit(req: Request, res: Response, next: NextFunction) { const ip = req.ip; const key = `ratelimit:api:${ip}:v1`; const current = await redis.incr(key); if (current === 1) { await redis.expire(key, CACHE_TTL.RATE_LIMIT); } if (current > 100) { // 100 requests per minute return res.status(429).json({ error: 'Too many requests, try again later' }); } next(); } ``` ### ML Predictions Cache ```python # apps/ml-engine/cache.py import redis import json from datetime import timedelta redis_client = redis.Redis(host='localhost', port=6379) async def get_prediction(symbol: str, timeframe: str): cache_key = f"ml:prediction:{symbol}:{timeframe}:v2" # Try cache cached = redis_client.get(cache_key) if cached: return json.loads(cached) # Generate prediction (expensive) prediction = await model.predict(symbol, timeframe) # Cache for 5 minutes redis_client.setex( cache_key, timedelta(minutes=5), json.dumps(prediction) ) return prediction ``` ### Cache Invalidation ```typescript // apps/backend/src/services/cache.service.ts class CacheInvalidator { // Invalidate single key async invalidate(key: string) { await redis.del(key); } // Invalidate pattern (use with caution) async invalidatePattern(pattern: string) { const keys = await redis.keys(pattern); if (keys.length > 0) { await redis.del(...keys); } } // Invalidate on model retrain async invalidateMLPredictions() { await this.invalidatePattern('ml:prediction:*'); } // Invalidate user session async invalidateSession(userId: string) { await this.invalidatePattern(`session:user:${userId}:*`); } } ``` --- ## Consecuencias ### Positivas 1. **Performance**: API responses < 100ms (cached), vs 500ms (uncached) 2. **Cost Savings**: 90% menos requests a APIs externas de market data 3. **Scalability**: Redis soporta 100K ops/sec en hardware modesto 4. **Rate Limiting**: Previene abuse sin afectar usuarios legítimos 5. **ML Efficiency**: Predicciones cached reducen load en GPU 6. **Developer Experience**: Redis CLI para debugging de cache 7. **Session Management**: Logout instantáneo vía invalidación ### Negativas 1. **Stale Data**: Cache puede servir datos desactualizados por max TTL 2. **Memory Cost**: Redis requiere ~2GB RAM para 10K usuarios activos 3. **Complexity**: Cache invalidation es "one of the two hard problems" 4. **Cold Start**: Primeros requests son lentos (cache warming needed) 5. **Thundering Herd**: Múltiples requests simultáneos al expirar cache 6. **Redis SPOF**: Si Redis cae, performance degrada (no falla, pero lento) ### Riesgos y Mitigaciones | Riesgo | Mitigación | |--------|-----------| | Cache invalidation bugs | Unit tests + integration tests de cache | | Thundering herd | Implement cache locking (SETNX pattern) | | Redis down | Graceful degradation: app funciona sin cache | | Memory exhausted | Eviction policy: `allkeys-lru` | | Stale ML predictions | Short TTL (5 min) + manual invalidation | --- ## Alternativas Consideradas ### 1. Memcached - **Pros**: Simple, muy rápido, menos memoria que Redis - **Contras**: No persistence, no data structures, no pub/sub - **Decisión**: ❌ Descartada - Redis ofrece más features por mismo costo ### 2. Application-Level Cache (Node.js Map) - **Pros**: Cero latencia, no external dependency - **Contras**: No distribuido, se pierde al reiniciar app - **Decisión**: ❌ Descartada - No escala a múltiples instancias ### 3. CDN Caching (Cloudflare/CloudFront) - **Pros**: Global distribution, DDoS protection - **Contras**: Solo para static assets, no funciona para API - **Decisión**: ⚠️ Complementario - Usar para frontend, no para API ### 4. Database Query Cache (PostgreSQL) - **Pros**: Ya incluido en Postgres - **Contras**: Limitado, no configurable por query - **Decisión**: ❌ Insuficiente - Necesitamos cache más agresivo ### 5. GraphQL + DataLoader - **Pros**: Batching automático, cache per-request - **Contras**: Requiere migrar de REST a GraphQL - **Decisión**: ❌ Descartada - REST es suficiente para MVP ### 6. No Cache (Optimizar DB Queries) - **Pros**: Menos complejidad, no stale data - **Contras**: Imposible alcanzar < 200ms para ML predictions - **Decisión**: ❌ Descartada - Performance requirements no alcanzables --- ## Cache Warming Strategy ```typescript // apps/backend/src/jobs/cache-warmer.ts import cron from 'node-cron'; // Run every 5 minutes cron.schedule('*/5 * * * *', async () => { const topSymbols = ['AAPL', 'TSLA', 'GOOGL', 'MSFT', 'AMZN']; for (const symbol of topSymbols) { // Pre-cache ML predictions for popular symbols await mlService.getPrediction(symbol, '1d'); await mlService.getPrediction(symbol, '1w'); // Pre-cache market data await marketService.getOHLCV(symbol, today()); } // Pre-cache leaderboards await leaderboardService.getMonthly(); }); ``` --- ## Monitoring ### Redis Metrics to Track ``` - Hit/Miss Ratio (target > 80%) - Evicted Keys (should be 0) - Memory Usage (alert at > 80%) - Connected Clients - Commands/sec ``` ### Logging ```typescript // Log cache hits/misses logger.info('cache_hit', { key, ttl: remaining }); logger.warn('cache_miss', { key, reason: 'expired' }); ``` --- ## Referencias - [Redis Documentation](https://redis.io/docs/) - [Cache Strategies](https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/Strategies.html) - [Cache Invalidation Patterns](https://martinfowler.com/bliki/TwoHardThings.html) - [Redis Best Practices](https://redis.io/docs/manual/patterns/)