trading-platform/docs/97-adr/ADR-006-caching.md
rckrdmrd c1b5081208 feat(ml): Complete FASE 11 - BTCUSD update and comprehensive documentation alignment
ML Engine Updates:
- Updated BTCUSD with Polygon API data (2024-2025): 215,699 new records
- Re-trained all ML models: Attention (R²: 0.223), Base, Metamodel (87.3% confidence)
- Backtest results: +176.71R profit with aggressive_filter strategy

Documentation Consolidation:
- Created docs/99-analisis/_MAP.md index with 13 new analysis documents
- Consolidated inventories: removed duplicates from orchestration/inventarios/
- Updated ML_INVENTORY.yml with BTCUSD metrics and training results
- Added execution reports: FASE11-BTCUSD, correction issues, alignment validation

Architecture & Integration:
- Updated all module documentation with NEXUS v3.4 frontmatter
- Fixed _MAP.md indexes across all folders
- Updated orchestration plans and traces

Files: 229 changed, 5064 insertions(+), 1872 deletions(-)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 09:31:29 -06:00

9.1 KiB

id title type project version updated_date
ADR-006-caching Estrategia de Caching Documentation trading-platform 1.0.0 2026-01-04

ADR-005: Estrategia de Caching

Estado: Aceptado Fecha: 2025-12-06 Decisores: Tech Lead, Arquitecto Relacionado: ADR-001


Contexto

Trading Platform necesita optimizar performance en varios aspectos:

  1. Predicciones ML: Modelos tardan 200-500ms, usuarios esperan < 100ms
  2. Market Data: APIs de trading limitan a 100 requests/min
  3. Sesiones de Usuario: JWT validation en cada request es costosa
  4. Rate Limiting: Prevenir abuse de API (DDoS, brute force)
  5. Leaderboards: Queries complejas que agregan datos de muchos usuarios
  6. Historical Data: Datos históricos raramente cambian

Requisitos de Performance:

  • API responses < 200ms (p95)
  • ML predictions < 100ms (cached)
  • Soporte para 10K usuarios concurrentes (post-MVP)

Decisión

Cache Layer: Redis 7.x

Redis como cache principal por:

  • In-memory: latencia < 1ms
  • Pub/Sub: Para invalidación de cache distribuida
  • Data structures: Lists, Sets, Sorted Sets para leaderboards
  • TTL nativo: Expiración automática
  • Persistence opcional: RDB snapshots para disaster recovery

Cache Strategy por Tipo de Dato

Data Type Strategy TTL Invalidation
Sessions Write-through 7 days Logout, password change
ML Predictions Cache-aside 5 min Model retrain
Market Data Cache-aside 1 min Webhook from broker API
User Profile Write-through 1 hour Profile update
Leaderboards Cache-aside 10 min Cron job rebuild
Historical OHLCV Cache-aside 24 hours Never (immutable)
Rate Limit Counter 1 min Auto-expire

TTL Configuration

// apps/backend/src/config/cache.ts
export const CACHE_TTL = {
  SESSION: 60 * 60 * 24 * 7,        // 7 days
  ML_PREDICTION: 60 * 5,             // 5 minutes
  MARKET_DATA: 60,                   // 1 minute
  USER_PROFILE: 60 * 60,             // 1 hour
  LEADERBOARD: 60 * 10,              // 10 minutes
  OHLCV_HISTORICAL: 60 * 60 * 24,    // 24 hours
  RATE_LIMIT: 60,                    // 1 minute
} as const;

Redis Key Naming Convention

{app}:{entity}:{id}:{version}

Examples:
- session:user:123abc:v1
- ml:prediction:AAPL:1d:v2
- market:ohlcv:TSLA:2025-12-06:v1
- user:profile:456def:v1
- leaderboard:monthly:2025-12:v1
- ratelimit:api:192.168.1.1:v1

Cache-Aside Pattern (Read-Heavy)

// apps/backend/src/services/market.service.ts
async function getOHLCV(symbol: string, date: string) {
  const cacheKey = `market:ohlcv:${symbol}:${date}:v1`;

  // 1. Try cache first
  const cached = await redis.get(cacheKey);
  if (cached) {
    return JSON.parse(cached);
  }

  // 2. Cache miss → fetch from DB
  const data = await db.ohlcv.findUnique({
    where: { symbol, date }
  });

  // 3. Store in cache
  await redis.setex(
    cacheKey,
    CACHE_TTL.OHLCV_HISTORICAL,
    JSON.stringify(data)
  );

  return data;
}

Write-Through Pattern (Write-Heavy)

// apps/backend/src/services/user.service.ts
async function updateProfile(userId: string, data: ProfileUpdate) {
  const cacheKey = `user:profile:${userId}:v1`;

  // 1. Write to DB first
  const updated = await db.user.update({
    where: { id: userId },
    data
  });

  // 2. Update cache
  await redis.setex(
    cacheKey,
    CACHE_TTL.USER_PROFILE,
    JSON.stringify(updated)
  );

  return updated;
}

Rate Limiting

// apps/backend/src/middleware/rateLimit.ts
async function rateLimit(req: Request, res: Response, next: NextFunction) {
  const ip = req.ip;
  const key = `ratelimit:api:${ip}:v1`;

  const current = await redis.incr(key);

  if (current === 1) {
    await redis.expire(key, CACHE_TTL.RATE_LIMIT);
  }

  if (current > 100) { // 100 requests per minute
    return res.status(429).json({
      error: 'Too many requests, try again later'
    });
  }

  next();
}

ML Predictions Cache

# apps/ml-engine/cache.py
import redis
import json
from datetime import timedelta

redis_client = redis.Redis(host='localhost', port=6379)

async def get_prediction(symbol: str, timeframe: str):
    cache_key = f"ml:prediction:{symbol}:{timeframe}:v2"

    # Try cache
    cached = redis_client.get(cache_key)
    if cached:
        return json.loads(cached)

    # Generate prediction (expensive)
    prediction = await model.predict(symbol, timeframe)

    # Cache for 5 minutes
    redis_client.setex(
        cache_key,
        timedelta(minutes=5),
        json.dumps(prediction)
    )

    return prediction

Cache Invalidation

// apps/backend/src/services/cache.service.ts
class CacheInvalidator {
  // Invalidate single key
  async invalidate(key: string) {
    await redis.del(key);
  }

  // Invalidate pattern (use with caution)
  async invalidatePattern(pattern: string) {
    const keys = await redis.keys(pattern);
    if (keys.length > 0) {
      await redis.del(...keys);
    }
  }

  // Invalidate on model retrain
  async invalidateMLPredictions() {
    await this.invalidatePattern('ml:prediction:*');
  }

  // Invalidate user session
  async invalidateSession(userId: string) {
    await this.invalidatePattern(`session:user:${userId}:*`);
  }
}

Consecuencias

Positivas

  1. Performance: API responses < 100ms (cached), vs 500ms (uncached)
  2. Cost Savings: 90% menos requests a APIs externas de market data
  3. Scalability: Redis soporta 100K ops/sec en hardware modesto
  4. Rate Limiting: Previene abuse sin afectar usuarios legítimos
  5. ML Efficiency: Predicciones cached reducen load en GPU
  6. Developer Experience: Redis CLI para debugging de cache
  7. Session Management: Logout instantáneo vía invalidación

Negativas

  1. Stale Data: Cache puede servir datos desactualizados por max TTL
  2. Memory Cost: Redis requiere ~2GB RAM para 10K usuarios activos
  3. Complexity: Cache invalidation es "one of the two hard problems"
  4. Cold Start: Primeros requests son lentos (cache warming needed)
  5. Thundering Herd: Múltiples requests simultáneos al expirar cache
  6. Redis SPOF: Si Redis cae, performance degrada (no falla, pero lento)

Riesgos y Mitigaciones

Riesgo Mitigación
Cache invalidation bugs Unit tests + integration tests de cache
Thundering herd Implement cache locking (SETNX pattern)
Redis down Graceful degradation: app funciona sin cache
Memory exhausted Eviction policy: allkeys-lru
Stale ML predictions Short TTL (5 min) + manual invalidation

Alternativas Consideradas

1. Memcached

  • Pros: Simple, muy rápido, menos memoria que Redis
  • Contras: No persistence, no data structures, no pub/sub
  • Decisión: Descartada - Redis ofrece más features por mismo costo

2. Application-Level Cache (Node.js Map)

  • Pros: Cero latencia, no external dependency
  • Contras: No distribuido, se pierde al reiniciar app
  • Decisión: Descartada - No escala a múltiples instancias

3. CDN Caching (Cloudflare/CloudFront)

  • Pros: Global distribution, DDoS protection
  • Contras: Solo para static assets, no funciona para API
  • Decisión: ⚠️ Complementario - Usar para frontend, no para API

4. Database Query Cache (PostgreSQL)

  • Pros: Ya incluido en Postgres
  • Contras: Limitado, no configurable por query
  • Decisión: Insuficiente - Necesitamos cache más agresivo

5. GraphQL + DataLoader

  • Pros: Batching automático, cache per-request
  • Contras: Requiere migrar de REST a GraphQL
  • Decisión: Descartada - REST es suficiente para MVP

6. No Cache (Optimizar DB Queries)

  • Pros: Menos complejidad, no stale data
  • Contras: Imposible alcanzar < 200ms para ML predictions
  • Decisión: Descartada - Performance requirements no alcanzables

Cache Warming Strategy

// apps/backend/src/jobs/cache-warmer.ts
import cron from 'node-cron';

// Run every 5 minutes
cron.schedule('*/5 * * * *', async () => {
  const topSymbols = ['AAPL', 'TSLA', 'GOOGL', 'MSFT', 'AMZN'];

  for (const symbol of topSymbols) {
    // Pre-cache ML predictions for popular symbols
    await mlService.getPrediction(symbol, '1d');
    await mlService.getPrediction(symbol, '1w');

    // Pre-cache market data
    await marketService.getOHLCV(symbol, today());
  }

  // Pre-cache leaderboards
  await leaderboardService.getMonthly();
});

Monitoring

Redis Metrics to Track

- Hit/Miss Ratio (target > 80%)
- Evicted Keys (should be 0)
- Memory Usage (alert at > 80%)
- Connected Clients
- Commands/sec

Logging

// Log cache hits/misses
logger.info('cache_hit', { key, ttl: remaining });
logger.warn('cache_miss', { key, reason: 'expired' });

Referencias