trading-platform/docs/01-arquitectura/INTEGRACION-LLM-LOCAL.md

---
id: "INTEGRACION-LLM-LOCAL"
title: "Integracion LLM Local - chatgpt-oss 16GB"
type: "Documentation"
project: "trading-platform"
version: "1.0.0"
updated_date: "2026-01-04"
---

# Integracion LLM Local - chatgpt-oss 16GB

**Version:** 1.0.0
**Fecha:** 2025-12-08
**Modulo:** OQI-007-llm-agent
**Autor:** Trading Strategist - Trading Platform

---

## Tabla de Contenidos

1. [Vision General](#vision-general)
2. [Especificaciones de Hardware](#especificaciones-de-hardware)
3. [Arquitectura de Integracion](#arquitectura-de-integracion)
4. [Configuracion del Modelo](#configuracion-del-modelo)
5. [Trading Tools](#trading-tools)
6. [System Prompt](#system-prompt)
7. [API Endpoints](#api-endpoints)
8. [Context Management](#context-management)
9. [Implementacion](#implementacion)
10. [Testing y Validacion](#testing-y-validacion)

---

## Vision General

### Objetivo

Integrar un modelo LLM local (chatgpt-oss o equivalente) que funcione como copiloto de trading, capaz de:

1. **Analizar senales ML** y explicarlas en lenguaje natural
2. **Tomar decisiones** de trading basadas en contexto
3. **Ejecutar operaciones** via MetaTrader4
4. **Responder preguntas** sobre mercados y estrategias
5. **Gestionar alertas** y notificaciones

### Arquitectura de Alto Nivel

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                         LLM TRADING COPILOT                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  User Interface                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  Chat Interface / CLI / Telegram Bot                                │   │
│  └───────────────────────────────────┬─────────────────────────────────┘   │
│                                      │                                      │
│                                      ▼                                      │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                    LLM Service (FastAPI)                            │   │
│  │  ┌─────────────────────────────────────────────────────────────────┐│   │
│  │  │  Request Handler                                                ││   │
│  │  │  - Parse user input                                             ││   │
│  │  │  - Load context from Redis                                      ││   │
│  │  │  - Build prompt                                                 ││   │
│  │  └──────────────────────────────────┬──────────────────────────────┘│   │
│  │                                     │                               │   │
│  │  ┌──────────────────────────────────▼──────────────────────────────┐│   │
│  │  │  LLM Engine (chatgpt-oss / Llama3)                              ││   │
│  │  │  ┌────────────────────────────────────────────────────────────┐ ││   │
│  │  │  │  Ollama / vLLM / llama.cpp                                 │ ││   │
│  │  │  │  - GPU: NVIDIA RTX 5060 Ti 16GB                            │ ││   │
│  │  │  │  - Context: 8K tokens                                      │ ││   │
│  │  │  │  - Response time: <3s                                      │ ││   │
│  │  │  └────────────────────────────────────────────────────────────┘ ││   │
│  │  └──────────────────────────────────┬──────────────────────────────┘│   │
│  │                                     │                               │   │
│  │  ┌──────────────────────────────────▼──────────────────────────────┐│   │
│  │  │  Tool Executor                                                  ││   │
│  │  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐           ││   │
│  │  │  │get_signal│ │ analyze  │ │ execute  │ │ portfolio│           ││   │
│  │  │  │          │ │ _market  │ │ _trade   │ │ _status  │           ││   │
│  │  │  └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘           ││   │
│  │  │       │            │            │            │                  ││   │
│  │  └───────┼────────────┼────────────┼────────────┼──────────────────┘│   │
│  └──────────┼────────────┼────────────┼────────────┼────────────────────┘   │
│             │            │            │            │                        │
│             ▼            ▼            ▼            ▼                        │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │                        External Services                              │  │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  │  │
│  │  │  ML Engine  │  │  PostgreSQL │  │ MetaTrader4 │  │    Redis    │  │  │
│  │  │  (signals)  │  │   (data)    │  │  (trading)  │  │  (context)  │  │  │
│  │  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘  │  │
│  └──────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

---

## Especificaciones de Hardware

### GPU Disponible

| Especificacion | Valor |
|----------------|-------|
| **Modelo** | NVIDIA RTX 5060 Ti |
| **VRAM** | 16GB GDDR6X |
| **CUDA Cores** | TBD |
| **Tensor Cores** | Si |

### Modelos Compatibles

| Modelo | VRAM Req | Context | Velocidad | Recomendacion |
|--------|----------|---------|-----------|---------------|
| **Llama 3 8B** | ~10GB | 8K | ~20 tok/s | Recomendado |
| **Mistral 7B** | ~8GB | 8K | ~25 tok/s | Alternativa |
| **Qwen2 7B** | ~9GB | 32K | ~22 tok/s | Alternativa |
| **Phi-3 Mini 3.8B** | ~4GB | 4K | ~40 tok/s | Backup |

### Configuracion Optima

```yaml
# Llama 3 8B Instruct
model:
  name: llama3:8b-instruct-q5_K_M
  quantization: Q5_K_M  # Balance calidad/VRAM
  context_length: 8192
  batch_size: 512

gpu:
  device: cuda:0
  memory_fraction: 0.85  # 13.6GB de 16GB
  offload: false

inference:
  temperature: 0.7
  top_p: 0.9
  max_tokens: 2048
  repeat_penalty: 1.1
```

---

## Arquitectura de Integracion

### Componentes

```
┌──────────────────────────────────────────────────────────────────────┐
│                    LLM SERVICE COMPONENTS                             │
├──────────────────────────────────────────────────────────────────────┤
│                                                                       │
│  1. LLM Runtime (Ollama)                                             │
│     ├── Model Server (GPU)                                           │
│     ├── REST API (:11434)                                            │
│     └── Model Management                                             │
│                                                                       │
│  2. Trading Service (FastAPI)                                        │
│     ├── Chat Endpoint (/api/chat)                                    │
│     ├── Tool Executor                                                │
│     └── Response Formatter                                           │
│                                                                       │
│  3. Context Manager (Redis)                                          │
│     ├── Conversation History                                         │
│     ├── Market Context                                               │
│     └── User Preferences                                             │
│                                                                       │
│  4. Trading Tools                                                    │
│     ├── get_ml_signal()                                              │
│     ├── analyze_market()                                             │
│     ├── execute_trade()                                              │
│     ├── get_portfolio()                                              │
│     └── set_alert()                                                  │
│                                                                       │
└──────────────────────────────────────────────────────────────────────┘
```

### Flujo de Request

```
User Message
      │
      ▼
┌─────────────────┐
│ FastAPI Handler │
└────────┬────────┘
         │
         ▼
┌─────────────────┐     ┌─────────────────┐
│ Load Context    │◄───▶│     Redis       │
│ from Redis      │     └─────────────────┘
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Build Prompt    │
│ (system + hist  │
│  + user msg)    │
└────────┬────────┘
         │
         ▼
┌─────────────────┐     ┌─────────────────┐
│ LLM Inference   │◄───▶│ Ollama (GPU)    │
│                 │     └─────────────────┘
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Parse Response  │
│ Check for Tools │
└────────┬────────┘
         │
    ┌────┴────┐
    │ Tools?  │
    └────┬────┘
    Yes  │  No
    │    │
    ▼    ▼
┌───────┐ ┌───────────────┐
│Execute│ │Return Response│
│ Tools │ │               │
└───┬───┘ └───────────────┘
    │
    ▼
┌─────────────────┐
│ Format Results  │
│ Send to LLM     │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Final Response  │
│ Save to Context │
└─────────────────┘
```

---

## Configuracion del Modelo

### Instalacion Ollama

```bash
# Instalar Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Verificar GPU
ollama run --verbose llama3:8b-instruct-q5_K_M

# Descargar modelo
ollama pull llama3:8b-instruct-q5_K_M

# Verificar VRAM
nvidia-smi
```

### Archivo de Configuracion

```yaml
# config/llm_config.yaml

llm:
  provider: ollama
  model: llama3:8b-instruct-q5_K_M
  base_url: http://localhost:11434

  inference:
    temperature: 0.7
    top_p: 0.9
    max_tokens: 2048
    stop_sequences:
      - "</tool>"
      - "Human:"

  context:
    max_history: 10  # ultimos 10 mensajes
    include_market_context: true
    include_portfolio: true

redis:
  host: localhost
  port: 6379
  db: 0
  prefix: "llm:"
  ttl: 3600  # 1 hora

tools:
  timeout: 30  # segundos por tool
  retry_count: 3
  parallel_execution: true

logging:
  level: INFO
  file: logs/llm_service.log
```

---

## Trading Tools

### Tool Definitions

```python
# tools/trading_tools.py

from typing import Dict, Any, Optional
import httpx

class TradingTools:
    """
    Trading tools disponibles para el LLM
    """

    def __init__(self, config: Dict):
        self.ml_engine_url = config['ml_engine_url']
        self.trading_url = config['trading_url']
        self.data_url = config['data_url']

    async def get_ml_signal(
        self,
        symbol: str,
        timeframe: str = "5m"
    ) -> Dict[str, Any]:
        """
        Obtiene senal ML actual para un simbolo

        Args:
            symbol: Par de trading (XAUUSD, EURUSD, etc.)
            timeframe: Timeframe (5m, 15m, 1h)

        Returns:
            {
                "action": "LONG" | "SHORT" | "HOLD",
                "confidence": 0.78,
                "entry_price": 2650.50,
                "stop_loss": 2645.20,
                "take_profit": 2661.10,
                "risk_reward": 2.0,
                "amd_phase": "accumulation",
                "killzone": "london_open",
                "reasoning": ["AMD: Accumulation (78%)", ...]
            }
        """
        async with httpx.AsyncClient() as client:
            response = await client.get(
                f"{self.ml_engine_url}/api/signal",
                params={"symbol": symbol, "timeframe": timeframe}
            )
            return response.json()

    async def analyze_market(
        self,
        symbol: str
    ) -> Dict[str, Any]:
        """
        Analiza el estado actual del mercado

        Returns:
            {
                "symbol": "XAUUSD",
                "current_price": 2650.50,
                "amd_phase": "accumulation",
                "ict_context": {
                    "killzone": "london_open",
                    "ote_zone": "discount",
                    "score": 0.72
                },
                "key_levels": {
                    "resistance": [2660.00, 2675.50],
                    "support": [2640.00, 2625.00]
                },
                "recent_signals": [...],
                "trend": "bullish",
                "volatility": "medium"
            }
        """
        async with httpx.AsyncClient() as client:
            response = await client.get(
                f"{self.ml_engine_url}/api/analysis",
                params={"symbol": symbol}
            )
            return response.json()

    async def execute_trade(
        self,
        symbol: str,
        action: str,
        size: float,
        stop_loss: float,
        take_profit: float,
        account_id: Optional[str] = None
    ) -> Dict[str, Any]:
        """
        Ejecuta una operacion de trading

        Args:
            symbol: Par de trading
            action: "BUY" o "SELL"
            size: Tamano de posicion (lots)
            stop_loss: Precio de stop loss
            take_profit: Precio de take profit
            account_id: ID de cuenta MT4 (opcional)

        Returns:
            {
                "success": true,
                "ticket": 123456,
                "executed_price": 2650.45,
                "slippage": 0.05,
                "message": "Order executed successfully"
            }
        """
        async with httpx.AsyncClient() as client:
            response = await client.post(
                f"{self.trading_url}/api/trade",
                json={
                    "symbol": symbol,
                    "action": action,
                    "size": size,
                    "stop_loss": stop_loss,
                    "take_profit": take_profit,
                    "account_id": account_id
                }
            )
            return response.json()

    async def get_portfolio(
        self,
        account_id: Optional[str] = None
    ) -> Dict[str, Any]:
        """
        Obtiene estado del portfolio

        Returns:
            {
                "balance": 10000.00,
                "equity": 10150.00,
                "margin": 500.00,
                "free_margin": 9650.00,
                "positions": [
                    {
                        "ticket": 123456,
                        "symbol": "XAUUSD",
                        "type": "BUY",
                        "size": 0.1,
                        "open_price": 2640.00,
                        "current_price": 2650.50,
                        "profit": 105.00,
                        "stop_loss": 2630.00,
                        "take_profit": 2660.00
                    }
                ],
                "daily_pnl": 250.00,
                "weekly_pnl": 450.00
            }
        """
        async with httpx.AsyncClient() as client:
            response = await client.get(
                f"{self.trading_url}/api/portfolio",
                params={"account_id": account_id}
            )
            return response.json()

    async def set_alert(
        self,
        symbol: str,
        condition: str,
        price: float,
        message: str
    ) -> Dict[str, Any]:
        """
        Configura una alerta de precio

        Args:
            symbol: Par de trading
            condition: "above" o "below"
            price: Precio objetivo
            message: Mensaje de la alerta

        Returns:
            {
                "alert_id": "alert_123",
                "status": "active",
                "message": "Alert created successfully"
            }
        """
        async with httpx.AsyncClient() as client:
            response = await client.post(
                f"{self.trading_url}/api/alerts",
                json={
                    "symbol": symbol,
                    "condition": condition,
                    "price": price,
                    "message": message
                }
            )
            return response.json()

    async def get_market_data(
        self,
        symbol: str,
        timeframe: str = "5m",
        bars: int = 100
    ) -> Dict[str, Any]:
        """
        Obtiene datos de mercado historicos

        Returns:
            {
                "symbol": "XAUUSD",
                "timeframe": "5m",
                "data": [
                    {"time": "2024-12-08 10:00", "open": 2648, "high": 2652, ...},
                    ...
                ]
            }
        """
        async with httpx.AsyncClient() as client:
            response = await client.get(
                f"{self.data_url}/api/market_data",
                params={
                    "symbol": symbol,
                    "timeframe": timeframe,
                    "bars": bars
                }
            )
            return response.json()
```

### Tool Schema para LLM

```python
# tools/tool_schema.py

TOOL_DEFINITIONS = [
    {
        "name": "get_ml_signal",
        "description": "Obtiene la senal de trading actual generada por los modelos de ML. Usa esto para conocer la recomendacion del sistema.",
        "parameters": {
            "type": "object",
            "properties": {
                "symbol": {
                    "type": "string",
                    "description": "Par de trading (XAUUSD, EURUSD, GBPUSD, USDJPY)",
                    "enum": ["XAUUSD", "EURUSD", "GBPUSD", "USDJPY"]
                },
                "timeframe": {
                    "type": "string",
                    "description": "Timeframe de analisis",
                    "enum": ["5m", "15m", "1h"],
                    "default": "5m"
                }
            },
            "required": ["symbol"]
        }
    },
    {
        "name": "analyze_market",
        "description": "Analiza el estado actual del mercado incluyendo fase AMD, contexto ICT, niveles clave y tendencia.",
        "parameters": {
            "type": "object",
            "properties": {
                "symbol": {
                    "type": "string",
                    "description": "Par de trading",
                    "enum": ["XAUUSD", "EURUSD", "GBPUSD", "USDJPY"]
                }
            },
            "required": ["symbol"]
        }
    },
    {
        "name": "execute_trade",
        "description": "Ejecuta una operacion de trading. IMPORTANTE: Siempre confirma con el usuario antes de ejecutar.",
        "parameters": {
            "type": "object",
            "properties": {
                "symbol": {
                    "type": "string",
                    "description": "Par de trading"
                },
                "action": {
                    "type": "string",
                    "description": "Tipo de orden",
                    "enum": ["BUY", "SELL"]
                },
                "size": {
                    "type": "number",
                    "description": "Tamano en lotes (0.01 - 10.0)"
                },
                "stop_loss": {
                    "type": "number",
                    "description": "Precio de stop loss"
                },
                "take_profit": {
                    "type": "number",
                    "description": "Precio de take profit"
                }
            },
            "required": ["symbol", "action", "size", "stop_loss", "take_profit"]
        }
    },
    {
        "name": "get_portfolio",
        "description": "Obtiene el estado actual del portfolio incluyendo balance, posiciones abiertas y P&L.",
        "parameters": {
            "type": "object",
            "properties": {},
            "required": []
        }
    },
    {
        "name": "set_alert",
        "description": "Configura una alerta de precio para notificacion.",
        "parameters": {
            "type": "object",
            "properties": {
                "symbol": {
                    "type": "string",
                    "description": "Par de trading"
                },
                "condition": {
                    "type": "string",
                    "enum": ["above", "below"]
                },
                "price": {
                    "type": "number",
                    "description": "Precio objetivo"
                },
                "message": {
                    "type": "string",
                    "description": "Mensaje de la alerta"
                }
            },
            "required": ["symbol", "condition", "price", "message"]
        }
    }
]
```

---

## System Prompt

### Prompt Principal

```python
# prompts/system_prompt.py

SYSTEM_PROMPT = """Eres Trading Platform AI, un copiloto de trading especializado en mercados de forex y metales preciosos.

## Tu Rol
- Analizar senales de los modelos ML y explicarlas claramente
- Ayudar a tomar decisiones de trading informadas
- Ejecutar operaciones cuando el usuario lo autorice
- Gestionar riesgo y proteger el capital

## Conocimiento
Tienes acceso a modelos ML que analizan:
- **AMD (Accumulation-Manipulation-Distribution)**: Fases del mercado
  - Accumulation: Instituciones acumulando, mejor para longs
  - Manipulation: Caza de stops, evitar entradas
  - Distribution: Instituciones vendiendo, mejor para shorts
- **ICT Concepts**: Killzones, OTE, Premium/Discount
  - London Open (02:00-05:00 EST): Alta probabilidad
  - NY AM (08:30-11:00 EST): Maxima liquidez
  - Discount Zone (0-50%): Mejor para compras
  - Premium Zone (50-100%): Mejor para ventas
- **SMC**: BOS, CHOCH, Inducement, Displacement

## Herramientas Disponibles
1. `get_ml_signal(symbol, timeframe)` - Obtiene senal ML actual
2. `analyze_market(symbol)` - Analisis completo del mercado
3. `execute_trade(symbol, action, size, sl, tp)` - Ejecuta operacion
4. `get_portfolio()` - Estado del portfolio
5. `set_alert(symbol, condition, price, message)` - Crear alerta

## Reglas Importantes
1. **SIEMPRE** explica el razonamiento detras de cada recomendacion
2. **NUNCA** ejecutes trades sin confirmacion explicita del usuario
3. **SIEMPRE** menciona el riesgo (R:R, % de cuenta)
4. **PRIORIZA** la preservacion del capital
5. Si la confianza del modelo es <60%, recomienda ESPERAR
6. En fase de Manipulation, recomienda NO OPERAR

## Formato de Respuesta para Senales
Cuando presentes una senal, usa este formato:

**Senal: [LONG/SHORT/HOLD]**
- Simbolo: [SYMBOL]
- Confianza: [X]%
- Fase AMD: [fase]
- Killzone: [killzone]

**Niveles:**
- Entry: [precio]
- Stop Loss: [precio] ([X] pips)
- Take Profit: [precio] ([X] pips)
- R:R: [X]:1

**Razonamiento:**
1. [razon 1]
2. [razon 2]
3. [razon 3]

**Riesgo:** [X]% de la cuenta

---

## Contexto Actual
{market_context}

## Portfolio
{portfolio_context}
"""

def build_system_prompt(market_context: dict = None, portfolio_context: dict = None) -> str:
    """Construye system prompt con contexto"""

    market_str = ""
    if market_context:
        market_str = f"""
Precio actual: {market_context.get('current_price', 'N/A')}
Fase AMD: {market_context.get('amd_phase', 'N/A')}
Killzone: {market_context.get('killzone', 'N/A')}
Tendencia: {market_context.get('trend', 'N/A')}
"""

    portfolio_str = ""
    if portfolio_context:
        portfolio_str = f"""
Balance: ${portfolio_context.get('balance', 0):,.2f}
Equity: ${portfolio_context.get('equity', 0):,.2f}
Posiciones abiertas: {len(portfolio_context.get('positions', []))}
P&L diario: ${portfolio_context.get('daily_pnl', 0):,.2f}
"""

    return SYSTEM_PROMPT.format(
        market_context=market_str or "No disponible",
        portfolio_context=portfolio_str or "No disponible"
    )
```

---

## API Endpoints

### FastAPI Service

```python
# services/llm_service.py

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Optional
import redis
import json

app = FastAPI(title="Trading Platform LLM Service")

class Message(BaseModel):
    role: str  # "user" or "assistant"
    content: str

class ChatRequest(BaseModel):
    message: str
    session_id: str
    symbol: Optional[str] = "XAUUSD"

class ChatResponse(BaseModel):
    response: str
    tools_used: List[str]
    signal: Optional[dict] = None

# Redis connection
redis_client = redis.Redis(host='localhost', port=6379, db=0)

@app.post("/api/chat", response_model=ChatResponse)
async def chat(request: ChatRequest):
    """
    Endpoint principal de chat
    """
    try:
        # 1. Load conversation history
        history = load_conversation_history(request.session_id)

        # 2. Get market context
        market_context = await get_market_context(request.symbol)

        # 3. Get portfolio context
        portfolio_context = await get_portfolio_context()

        # 4. Build prompt
        system_prompt = build_system_prompt(market_context, portfolio_context)

        # 5. Call LLM
        response, tools_used = await call_llm(
            system_prompt=system_prompt,
            history=history,
            user_message=request.message
        )

        # 6. Save to history
        save_conversation(request.session_id, request.message, response)

        # 7. Extract signal if present
        signal = extract_signal_from_response(response)

        return ChatResponse(
            response=response,
            tools_used=tools_used,
            signal=signal
        )

    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.get("/api/health")
async def health():
    """Health check"""
    return {"status": "healthy", "llm_status": await check_llm_status()}

@app.post("/api/clear_history")
async def clear_history(session_id: str):
    """Limpia historial de conversacion"""
    redis_client.delete(f"llm:history:{session_id}")
    return {"status": "cleared"}

async def call_llm(
    system_prompt: str,
    history: List[Message],
    user_message: str
) -> tuple[str, List[str]]:
    """
    Llama al LLM via Ollama
    """
    import httpx

    # Build messages
    messages = [{"role": "system", "content": system_prompt}]

    for msg in history:
        messages.append({"role": msg.role, "content": msg.content})

    messages.append({"role": "user", "content": user_message})

    # Call Ollama
    async with httpx.AsyncClient(timeout=60.0) as client:
        response = await client.post(
            "http://localhost:11434/api/chat",
            json={
                "model": "llama3:8b-instruct-q5_K_M",
                "messages": messages,
                "stream": False,
                "options": {
                    "temperature": 0.7,
                    "top_p": 0.9,
                    "num_predict": 2048
                }
            }
        )

        result = response.json()
        llm_response = result["message"]["content"]

    # Check for tool calls
    tools_used = []
    if "<tool>" in llm_response:
        llm_response, tools_used = await process_tool_calls(llm_response)

    return llm_response, tools_used

async def process_tool_calls(response: str) -> tuple[str, List[str]]:
    """
    Procesa llamadas a tools en la respuesta
    """
    import re

    tools_used = []
    tool_pattern = r"<tool>(\w+)\((.*?)\)</tool>"

    matches = re.findall(tool_pattern, response)

    for tool_name, args_str in matches:
        tools_used.append(tool_name)

        # Parse arguments
        args = json.loads(f"{{{args_str}}}")

        # Execute tool
        tool_result = await execute_tool(tool_name, args)

        # Replace in response
        response = response.replace(
            f"<tool>{tool_name}({args_str})</tool>",
            f"\n**{tool_name} result:**\n```json\n{json.dumps(tool_result, indent=2)}\n```\n"
        )

    return response, tools_used
```

---

## Context Management

### Redis Schema

```python
# context/redis_schema.py

"""
Redis Keys Structure:

llm:history:{session_id}           - Conversation history (LIST)
llm:market_context:{symbol}        - Market context cache (STRING, TTL=60s)
llm:portfolio:{user_id}            - Portfolio cache (STRING, TTL=30s)
llm:user_prefs:{user_id}           - User preferences (HASH)
llm:alerts:{user_id}               - Active alerts (SET)
"""

import redis
from typing import List, Dict, Optional
import json

class ContextManager:
    def __init__(self, redis_client: redis.Redis):
        self.redis = redis_client
        self.history_ttl = 3600  # 1 hour
        self.cache_ttl = 60  # 60 seconds

    def get_conversation_history(
        self,
        session_id: str,
        max_messages: int = 10
    ) -> List[Dict]:
        """Obtiene historial de conversacion"""
        key = f"llm:history:{session_id}"
        history = self.redis.lrange(key, -max_messages, -1)
        return [json.loads(h) for h in history]

    def add_to_history(
        self,
        session_id: str,
        user_message: str,
        assistant_response: str
    ):
        """Agrega mensaje al historial"""
        key = f"llm:history:{session_id}"

        self.redis.rpush(key, json.dumps({
            "role": "user",
            "content": user_message
        }))

        self.redis.rpush(key, json.dumps({
            "role": "assistant",
            "content": assistant_response
        }))

        # Mantener solo ultimos N mensajes
        self.redis.ltrim(key, -20, -1)
        self.redis.expire(key, self.history_ttl)

    def get_market_context(self, symbol: str) -> Optional[Dict]:
        """Obtiene contexto de mercado cacheado"""
        key = f"llm:market_context:{symbol}"
        data = self.redis.get(key)
        return json.loads(data) if data else None

    def set_market_context(self, symbol: str, context: Dict):
        """Cachea contexto de mercado"""
        key = f"llm:market_context:{symbol}"
        self.redis.setex(key, self.cache_ttl, json.dumps(context))

    def get_user_preferences(self, user_id: str) -> Dict:
        """Obtiene preferencias del usuario"""
        key = f"llm:user_prefs:{user_id}"
        return self.redis.hgetall(key) or {}

    def set_user_preference(self, user_id: str, pref: str, value: str):
        """Establece preferencia del usuario"""
        key = f"llm:user_prefs:{user_id}"
        self.redis.hset(key, pref, value)
```

---

## Implementacion

### Docker Compose

**IMPORTANTE:** Los puertos deben seguir la politica definida en `/core/devtools/environment/DEVENV-PORTS.md`

**Puertos asignados a trading-platform:**
- Rango base: 3600
- Frontend: 5179
- Backend API: 3600
- Database: 5438 (o 5432 compartido)
- Redis: 6385
- MinIO: 9600/9601

```yaml
# docker-compose.llm.yaml

version: '3.8'

services:
  ollama:
    image: ollama/ollama:latest
    container_name: trading-ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    restart: unless-stopped

  llm-service:
    build:
      context: .
      dockerfile: Dockerfile.llm
    container_name: trading-llm
    ports:
      - "3602:3602"  # LLM service (base 3600 + 2)
    environment:
      - OLLAMA_URL=http://ollama:11434
      - REDIS_URL=redis://redis:6385
      - ML_ENGINE_URL=http://ml-engine:3601
      - TRADING_URL=http://trading-service:3603
    depends_on:
      - ollama
      - redis
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    container_name: trading-redis
    ports:
      - "6385:6379"  # Puerto asignado segun DEVENV-PORTS
    volumes:
      - redis_data:/data
    restart: unless-stopped

volumes:
  ollama_data:
  redis_data:
```

### Script de Inicializacion

```bash
#!/bin/bash
# scripts/init_llm.sh

echo "=== Trading Platform LLM Setup ==="

# 1. Check GPU
echo "Checking GPU..."
nvidia-smi

# 2. Start Ollama
echo "Starting Ollama..."
ollama serve &
sleep 5

# 3. Pull model
echo "Pulling Llama3 model..."
ollama pull llama3:8b-instruct-q5_K_M

# 4. Test model
echo "Testing model..."
ollama run llama3:8b-instruct-q5_K_M "Hello, respond with OK if working"

# 5. Start services
echo "Starting LLM service..."
docker-compose -f docker-compose.llm.yaml up -d

echo "=== Setup Complete ==="
echo "LLM Service: http://localhost:3602"
echo "Ollama API: http://localhost:11434"
```

---

## Testing y Validacion

### Test Cases

```python
# tests/test_llm_service.py

import pytest
import httpx

LLM_URL = "http://localhost:3602"  # Puerto asignado segun DEVENV-PORTS (base 3600 + 2)

@pytest.mark.asyncio
async def test_health_check():
    """Test health endpoint"""
    async with httpx.AsyncClient() as client:
        response = await client.get(f"{LLM_URL}/api/health")
        assert response.status_code == 200
        assert response.json()["status"] == "healthy"

@pytest.mark.asyncio
async def test_simple_chat():
    """Test basic chat"""
    async with httpx.AsyncClient(timeout=60.0) as client:
        response = await client.post(
            f"{LLM_URL}/api/chat",
            json={
                "message": "Hola, como estas?",
                "session_id": "test_session"
            }
        )
        assert response.status_code == 200
        assert len(response.json()["response"]) > 0

@pytest.mark.asyncio
async def test_get_signal():
    """Test signal retrieval via LLM"""
    async with httpx.AsyncClient(timeout=60.0) as client:
        response = await client.post(
            f"{LLM_URL}/api/chat",
            json={
                "message": "Dame la senal actual para XAUUSD",
                "session_id": "test_signal",
                "symbol": "XAUUSD"
            }
        )
        assert response.status_code == 200
        data = response.json()
        assert "get_ml_signal" in data["tools_used"]

@pytest.mark.asyncio
async def test_response_time():
    """Test response time < 3s"""
    import time

    async with httpx.AsyncClient(timeout=60.0) as client:
        start = time.time()
        response = await client.post(
            f"{LLM_URL}/api/chat",
            json={
                "message": "Analiza el mercado de XAUUSD",
                "session_id": "test_perf"
            }
        )
        elapsed = time.time() - start

        assert response.status_code == 200
        assert elapsed < 5.0  # 5s max (including tool calls)
```

### Metricas de Validacion

| Metrica | Target | Como Medir |
|---------|--------|------------|
| Response Time | <3s | pytest benchmark |
| Tool Accuracy | >95% | Manual review |
| Context Retention | 100% | Test history |
| GPU Memory | <14GB | nvidia-smi |
| Uptime | >99% | Monitoring |

---

**Documento Generado:** 2025-12-08
**Trading Strategist - Trading Platform**