--- id: "INTEGRACION-LLM-LOCAL" title: "Integracion LLM Local - chatgpt-oss 16GB" type: "Documentation" project: "trading-platform" version: "1.0.0" updated_date: "2026-01-04" --- # Integracion LLM Local - chatgpt-oss 16GB **Version:** 1.0.0 **Fecha:** 2025-12-08 **Modulo:** OQI-007-llm-agent **Autor:** Trading Strategist - Trading Platform --- ## Tabla de Contenidos 1. [Vision General](#vision-general) 2. [Especificaciones de Hardware](#especificaciones-de-hardware) 3. [Arquitectura de Integracion](#arquitectura-de-integracion) 4. [Configuracion del Modelo](#configuracion-del-modelo) 5. [Trading Tools](#trading-tools) 6. [System Prompt](#system-prompt) 7. [API Endpoints](#api-endpoints) 8. [Context Management](#context-management) 9. [Implementacion](#implementacion) 10. [Testing y Validacion](#testing-y-validacion) --- ## Vision General ### Objetivo Integrar un modelo LLM local (chatgpt-oss o equivalente) que funcione como copiloto de trading, capaz de: 1. **Analizar senales ML** y explicarlas en lenguaje natural 2. **Tomar decisiones** de trading basadas en contexto 3. **Ejecutar operaciones** via MetaTrader4 4. **Responder preguntas** sobre mercados y estrategias 5. **Gestionar alertas** y notificaciones ### Arquitectura de Alto Nivel ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ LLM TRADING COPILOT │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ User Interface │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ Chat Interface / CLI / Telegram Bot │ │ │ └───────────────────────────────────┬─────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ LLM Service (FastAPI) │ │ │ │ ┌─────────────────────────────────────────────────────────────────┐│ │ │ │ │ Request Handler ││ │ │ │ │ - Parse user input ││ │ │ │ │ - Load context from Redis ││ │ │ │ │ - Build prompt ││ │ │ │ └──────────────────────────────────┬──────────────────────────────┘│ │ │ │ │ │ │ │ │ ┌──────────────────────────────────▼──────────────────────────────┐│ │ │ │ │ LLM Engine (chatgpt-oss / Llama3) ││ │ │ │ │ ┌────────────────────────────────────────────────────────────┐ ││ │ │ │ │ │ Ollama / vLLM / llama.cpp │ ││ │ │ │ │ │ - GPU: NVIDIA RTX 5060 Ti 16GB │ ││ │ │ │ │ │ - Context: 8K tokens │ ││ │ │ │ │ │ - Response time: <3s │ ││ │ │ │ │ └────────────────────────────────────────────────────────────┘ ││ │ │ │ └──────────────────────────────────┬──────────────────────────────┘│ │ │ │ │ │ │ │ │ ┌──────────────────────────────────▼──────────────────────────────┐│ │ │ │ │ Tool Executor ││ │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │ │ │ │ │get_signal│ │ analyze │ │ execute │ │ portfolio│ ││ │ │ │ │ │ │ │ _market │ │ _trade │ │ _status │ ││ │ │ │ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ ││ │ │ │ │ │ │ │ │ ││ │ │ │ └───────┼────────────┼────────────┼────────────┼──────────────────┘│ │ │ └──────────┼────────────┼────────────┼────────────┼────────────────────┘ │ │ │ │ │ │ │ │ ▼ ▼ ▼ ▼ │ │ ┌──────────────────────────────────────────────────────────────────────┐ │ │ │ External Services │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ │ │ ML Engine │ │ PostgreSQL │ │ MetaTrader4 │ │ Redis │ │ │ │ │ │ (signals) │ │ (data) │ │ (trading) │ │ (context) │ │ │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` --- ## Especificaciones de Hardware ### GPU Disponible | Especificacion | Valor | |----------------|-------| | **Modelo** | NVIDIA RTX 5060 Ti | | **VRAM** | 16GB GDDR6X | | **CUDA Cores** | TBD | | **Tensor Cores** | Si | ### Modelos Compatibles | Modelo | VRAM Req | Context | Velocidad | Recomendacion | |--------|----------|---------|-----------|---------------| | **Llama 3 8B** | ~10GB | 8K | ~20 tok/s | Recomendado | | **Mistral 7B** | ~8GB | 8K | ~25 tok/s | Alternativa | | **Qwen2 7B** | ~9GB | 32K | ~22 tok/s | Alternativa | | **Phi-3 Mini 3.8B** | ~4GB | 4K | ~40 tok/s | Backup | ### Configuracion Optima ```yaml # Llama 3 8B Instruct model: name: llama3:8b-instruct-q5_K_M quantization: Q5_K_M # Balance calidad/VRAM context_length: 8192 batch_size: 512 gpu: device: cuda:0 memory_fraction: 0.85 # 13.6GB de 16GB offload: false inference: temperature: 0.7 top_p: 0.9 max_tokens: 2048 repeat_penalty: 1.1 ``` --- ## Arquitectura de Integracion ### Componentes ``` ┌──────────────────────────────────────────────────────────────────────┐ │ LLM SERVICE COMPONENTS │ ├──────────────────────────────────────────────────────────────────────┤ │ │ │ 1. LLM Runtime (Ollama) │ │ ├── Model Server (GPU) │ │ ├── REST API (:11434) │ │ └── Model Management │ │ │ │ 2. Trading Service (FastAPI) │ │ ├── Chat Endpoint (/api/chat) │ │ ├── Tool Executor │ │ └── Response Formatter │ │ │ │ 3. Context Manager (Redis) │ │ ├── Conversation History │ │ ├── Market Context │ │ └── User Preferences │ │ │ │ 4. Trading Tools │ │ ├── get_ml_signal() │ │ ├── analyze_market() │ │ ├── execute_trade() │ │ ├── get_portfolio() │ │ └── set_alert() │ │ │ └──────────────────────────────────────────────────────────────────────┘ ``` ### Flujo de Request ``` User Message │ ▼ ┌─────────────────┐ │ FastAPI Handler │ └────────┬────────┘ │ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Load Context │◄───▶│ Redis │ │ from Redis │ └─────────────────┘ └────────┬────────┘ │ ▼ ┌─────────────────┐ │ Build Prompt │ │ (system + hist │ │ + user msg) │ └────────┬────────┘ │ ▼ ┌─────────────────┐ ┌─────────────────┐ │ LLM Inference │◄───▶│ Ollama (GPU) │ │ │ └─────────────────┘ └────────┬────────┘ │ ▼ ┌─────────────────┐ │ Parse Response │ │ Check for Tools │ └────────┬────────┘ │ ┌────┴────┐ │ Tools? │ └────┬────┘ Yes │ No │ │ ▼ ▼ ┌───────┐ ┌───────────────┐ │Execute│ │Return Response│ │ Tools │ │ │ └───┬───┘ └───────────────┘ │ ▼ ┌─────────────────┐ │ Format Results │ │ Send to LLM │ └────────┬────────┘ │ ▼ ┌─────────────────┐ │ Final Response │ │ Save to Context │ └─────────────────┘ ``` --- ## Configuracion del Modelo ### Instalacion Ollama ```bash # Instalar Ollama curl -fsSL https://ollama.com/install.sh | sh # Verificar GPU ollama run --verbose llama3:8b-instruct-q5_K_M # Descargar modelo ollama pull llama3:8b-instruct-q5_K_M # Verificar VRAM nvidia-smi ``` ### Archivo de Configuracion ```yaml # config/llm_config.yaml llm: provider: ollama model: llama3:8b-instruct-q5_K_M base_url: http://localhost:11434 inference: temperature: 0.7 top_p: 0.9 max_tokens: 2048 stop_sequences: - "" - "Human:" context: max_history: 10 # ultimos 10 mensajes include_market_context: true include_portfolio: true redis: host: localhost port: 6379 db: 0 prefix: "llm:" ttl: 3600 # 1 hora tools: timeout: 30 # segundos por tool retry_count: 3 parallel_execution: true logging: level: INFO file: logs/llm_service.log ``` --- ## Trading Tools ### Tool Definitions ```python # tools/trading_tools.py from typing import Dict, Any, Optional import httpx class TradingTools: """ Trading tools disponibles para el LLM """ def __init__(self, config: Dict): self.ml_engine_url = config['ml_engine_url'] self.trading_url = config['trading_url'] self.data_url = config['data_url'] async def get_ml_signal( self, symbol: str, timeframe: str = "5m" ) -> Dict[str, Any]: """ Obtiene senal ML actual para un simbolo Args: symbol: Par de trading (XAUUSD, EURUSD, etc.) timeframe: Timeframe (5m, 15m, 1h) Returns: { "action": "LONG" | "SHORT" | "HOLD", "confidence": 0.78, "entry_price": 2650.50, "stop_loss": 2645.20, "take_profit": 2661.10, "risk_reward": 2.0, "amd_phase": "accumulation", "killzone": "london_open", "reasoning": ["AMD: Accumulation (78%)", ...] } """ async with httpx.AsyncClient() as client: response = await client.get( f"{self.ml_engine_url}/api/signal", params={"symbol": symbol, "timeframe": timeframe} ) return response.json() async def analyze_market( self, symbol: str ) -> Dict[str, Any]: """ Analiza el estado actual del mercado Returns: { "symbol": "XAUUSD", "current_price": 2650.50, "amd_phase": "accumulation", "ict_context": { "killzone": "london_open", "ote_zone": "discount", "score": 0.72 }, "key_levels": { "resistance": [2660.00, 2675.50], "support": [2640.00, 2625.00] }, "recent_signals": [...], "trend": "bullish", "volatility": "medium" } """ async with httpx.AsyncClient() as client: response = await client.get( f"{self.ml_engine_url}/api/analysis", params={"symbol": symbol} ) return response.json() async def execute_trade( self, symbol: str, action: str, size: float, stop_loss: float, take_profit: float, account_id: Optional[str] = None ) -> Dict[str, Any]: """ Ejecuta una operacion de trading Args: symbol: Par de trading action: "BUY" o "SELL" size: Tamano de posicion (lots) stop_loss: Precio de stop loss take_profit: Precio de take profit account_id: ID de cuenta MT4 (opcional) Returns: { "success": true, "ticket": 123456, "executed_price": 2650.45, "slippage": 0.05, "message": "Order executed successfully" } """ async with httpx.AsyncClient() as client: response = await client.post( f"{self.trading_url}/api/trade", json={ "symbol": symbol, "action": action, "size": size, "stop_loss": stop_loss, "take_profit": take_profit, "account_id": account_id } ) return response.json() async def get_portfolio( self, account_id: Optional[str] = None ) -> Dict[str, Any]: """ Obtiene estado del portfolio Returns: { "balance": 10000.00, "equity": 10150.00, "margin": 500.00, "free_margin": 9650.00, "positions": [ { "ticket": 123456, "symbol": "XAUUSD", "type": "BUY", "size": 0.1, "open_price": 2640.00, "current_price": 2650.50, "profit": 105.00, "stop_loss": 2630.00, "take_profit": 2660.00 } ], "daily_pnl": 250.00, "weekly_pnl": 450.00 } """ async with httpx.AsyncClient() as client: response = await client.get( f"{self.trading_url}/api/portfolio", params={"account_id": account_id} ) return response.json() async def set_alert( self, symbol: str, condition: str, price: float, message: str ) -> Dict[str, Any]: """ Configura una alerta de precio Args: symbol: Par de trading condition: "above" o "below" price: Precio objetivo message: Mensaje de la alerta Returns: { "alert_id": "alert_123", "status": "active", "message": "Alert created successfully" } """ async with httpx.AsyncClient() as client: response = await client.post( f"{self.trading_url}/api/alerts", json={ "symbol": symbol, "condition": condition, "price": price, "message": message } ) return response.json() async def get_market_data( self, symbol: str, timeframe: str = "5m", bars: int = 100 ) -> Dict[str, Any]: """ Obtiene datos de mercado historicos Returns: { "symbol": "XAUUSD", "timeframe": "5m", "data": [ {"time": "2024-12-08 10:00", "open": 2648, "high": 2652, ...}, ... ] } """ async with httpx.AsyncClient() as client: response = await client.get( f"{self.data_url}/api/market_data", params={ "symbol": symbol, "timeframe": timeframe, "bars": bars } ) return response.json() ``` ### Tool Schema para LLM ```python # tools/tool_schema.py TOOL_DEFINITIONS = [ { "name": "get_ml_signal", "description": "Obtiene la senal de trading actual generada por los modelos de ML. Usa esto para conocer la recomendacion del sistema.", "parameters": { "type": "object", "properties": { "symbol": { "type": "string", "description": "Par de trading (XAUUSD, EURUSD, GBPUSD, USDJPY)", "enum": ["XAUUSD", "EURUSD", "GBPUSD", "USDJPY"] }, "timeframe": { "type": "string", "description": "Timeframe de analisis", "enum": ["5m", "15m", "1h"], "default": "5m" } }, "required": ["symbol"] } }, { "name": "analyze_market", "description": "Analiza el estado actual del mercado incluyendo fase AMD, contexto ICT, niveles clave y tendencia.", "parameters": { "type": "object", "properties": { "symbol": { "type": "string", "description": "Par de trading", "enum": ["XAUUSD", "EURUSD", "GBPUSD", "USDJPY"] } }, "required": ["symbol"] } }, { "name": "execute_trade", "description": "Ejecuta una operacion de trading. IMPORTANTE: Siempre confirma con el usuario antes de ejecutar.", "parameters": { "type": "object", "properties": { "symbol": { "type": "string", "description": "Par de trading" }, "action": { "type": "string", "description": "Tipo de orden", "enum": ["BUY", "SELL"] }, "size": { "type": "number", "description": "Tamano en lotes (0.01 - 10.0)" }, "stop_loss": { "type": "number", "description": "Precio de stop loss" }, "take_profit": { "type": "number", "description": "Precio de take profit" } }, "required": ["symbol", "action", "size", "stop_loss", "take_profit"] } }, { "name": "get_portfolio", "description": "Obtiene el estado actual del portfolio incluyendo balance, posiciones abiertas y P&L.", "parameters": { "type": "object", "properties": {}, "required": [] } }, { "name": "set_alert", "description": "Configura una alerta de precio para notificacion.", "parameters": { "type": "object", "properties": { "symbol": { "type": "string", "description": "Par de trading" }, "condition": { "type": "string", "enum": ["above", "below"] }, "price": { "type": "number", "description": "Precio objetivo" }, "message": { "type": "string", "description": "Mensaje de la alerta" } }, "required": ["symbol", "condition", "price", "message"] } } ] ``` --- ## System Prompt ### Prompt Principal ```python # prompts/system_prompt.py SYSTEM_PROMPT = """Eres Trading Platform AI, un copiloto de trading especializado en mercados de forex y metales preciosos. ## Tu Rol - Analizar senales de los modelos ML y explicarlas claramente - Ayudar a tomar decisiones de trading informadas - Ejecutar operaciones cuando el usuario lo autorice - Gestionar riesgo y proteger el capital ## Conocimiento Tienes acceso a modelos ML que analizan: - **AMD (Accumulation-Manipulation-Distribution)**: Fases del mercado - Accumulation: Instituciones acumulando, mejor para longs - Manipulation: Caza de stops, evitar entradas - Distribution: Instituciones vendiendo, mejor para shorts - **ICT Concepts**: Killzones, OTE, Premium/Discount - London Open (02:00-05:00 EST): Alta probabilidad - NY AM (08:30-11:00 EST): Maxima liquidez - Discount Zone (0-50%): Mejor para compras - Premium Zone (50-100%): Mejor para ventas - **SMC**: BOS, CHOCH, Inducement, Displacement ## Herramientas Disponibles 1. `get_ml_signal(symbol, timeframe)` - Obtiene senal ML actual 2. `analyze_market(symbol)` - Analisis completo del mercado 3. `execute_trade(symbol, action, size, sl, tp)` - Ejecuta operacion 4. `get_portfolio()` - Estado del portfolio 5. `set_alert(symbol, condition, price, message)` - Crear alerta ## Reglas Importantes 1. **SIEMPRE** explica el razonamiento detras de cada recomendacion 2. **NUNCA** ejecutes trades sin confirmacion explicita del usuario 3. **SIEMPRE** menciona el riesgo (R:R, % de cuenta) 4. **PRIORIZA** la preservacion del capital 5. Si la confianza del modelo es <60%, recomienda ESPERAR 6. En fase de Manipulation, recomienda NO OPERAR ## Formato de Respuesta para Senales Cuando presentes una senal, usa este formato: **Senal: [LONG/SHORT/HOLD]** - Simbolo: [SYMBOL] - Confianza: [X]% - Fase AMD: [fase] - Killzone: [killzone] **Niveles:** - Entry: [precio] - Stop Loss: [precio] ([X] pips) - Take Profit: [precio] ([X] pips) - R:R: [X]:1 **Razonamiento:** 1. [razon 1] 2. [razon 2] 3. [razon 3] **Riesgo:** [X]% de la cuenta --- ## Contexto Actual {market_context} ## Portfolio {portfolio_context} """ def build_system_prompt(market_context: dict = None, portfolio_context: dict = None) -> str: """Construye system prompt con contexto""" market_str = "" if market_context: market_str = f""" Precio actual: {market_context.get('current_price', 'N/A')} Fase AMD: {market_context.get('amd_phase', 'N/A')} Killzone: {market_context.get('killzone', 'N/A')} Tendencia: {market_context.get('trend', 'N/A')} """ portfolio_str = "" if portfolio_context: portfolio_str = f""" Balance: ${portfolio_context.get('balance', 0):,.2f} Equity: ${portfolio_context.get('equity', 0):,.2f} Posiciones abiertas: {len(portfolio_context.get('positions', []))} P&L diario: ${portfolio_context.get('daily_pnl', 0):,.2f} """ return SYSTEM_PROMPT.format( market_context=market_str or "No disponible", portfolio_context=portfolio_str or "No disponible" ) ``` --- ## API Endpoints ### FastAPI Service ```python # services/llm_service.py from fastapi import FastAPI, HTTPException from pydantic import BaseModel from typing import List, Optional import redis import json app = FastAPI(title="Trading Platform LLM Service") class Message(BaseModel): role: str # "user" or "assistant" content: str class ChatRequest(BaseModel): message: str session_id: str symbol: Optional[str] = "XAUUSD" class ChatResponse(BaseModel): response: str tools_used: List[str] signal: Optional[dict] = None # Redis connection redis_client = redis.Redis(host='localhost', port=6379, db=0) @app.post("/api/chat", response_model=ChatResponse) async def chat(request: ChatRequest): """ Endpoint principal de chat """ try: # 1. Load conversation history history = load_conversation_history(request.session_id) # 2. Get market context market_context = await get_market_context(request.symbol) # 3. Get portfolio context portfolio_context = await get_portfolio_context() # 4. Build prompt system_prompt = build_system_prompt(market_context, portfolio_context) # 5. Call LLM response, tools_used = await call_llm( system_prompt=system_prompt, history=history, user_message=request.message ) # 6. Save to history save_conversation(request.session_id, request.message, response) # 7. Extract signal if present signal = extract_signal_from_response(response) return ChatResponse( response=response, tools_used=tools_used, signal=signal ) except Exception as e: raise HTTPException(status_code=500, detail=str(e)) @app.get("/api/health") async def health(): """Health check""" return {"status": "healthy", "llm_status": await check_llm_status()} @app.post("/api/clear_history") async def clear_history(session_id: str): """Limpia historial de conversacion""" redis_client.delete(f"llm:history:{session_id}") return {"status": "cleared"} async def call_llm( system_prompt: str, history: List[Message], user_message: str ) -> tuple[str, List[str]]: """ Llama al LLM via Ollama """ import httpx # Build messages messages = [{"role": "system", "content": system_prompt}] for msg in history: messages.append({"role": msg.role, "content": msg.content}) messages.append({"role": "user", "content": user_message}) # Call Ollama async with httpx.AsyncClient(timeout=60.0) as client: response = await client.post( "http://localhost:11434/api/chat", json={ "model": "llama3:8b-instruct-q5_K_M", "messages": messages, "stream": False, "options": { "temperature": 0.7, "top_p": 0.9, "num_predict": 2048 } } ) result = response.json() llm_response = result["message"]["content"] # Check for tool calls tools_used = [] if "" in llm_response: llm_response, tools_used = await process_tool_calls(llm_response) return llm_response, tools_used async def process_tool_calls(response: str) -> tuple[str, List[str]]: """ Procesa llamadas a tools en la respuesta """ import re tools_used = [] tool_pattern = r"(\w+)\((.*?)\)" matches = re.findall(tool_pattern, response) for tool_name, args_str in matches: tools_used.append(tool_name) # Parse arguments args = json.loads(f"{{{args_str}}}") # Execute tool tool_result = await execute_tool(tool_name, args) # Replace in response response = response.replace( f"{tool_name}({args_str})", f"\n**{tool_name} result:**\n```json\n{json.dumps(tool_result, indent=2)}\n```\n" ) return response, tools_used ``` --- ## Context Management ### Redis Schema ```python # context/redis_schema.py """ Redis Keys Structure: llm:history:{session_id} - Conversation history (LIST) llm:market_context:{symbol} - Market context cache (STRING, TTL=60s) llm:portfolio:{user_id} - Portfolio cache (STRING, TTL=30s) llm:user_prefs:{user_id} - User preferences (HASH) llm:alerts:{user_id} - Active alerts (SET) """ import redis from typing import List, Dict, Optional import json class ContextManager: def __init__(self, redis_client: redis.Redis): self.redis = redis_client self.history_ttl = 3600 # 1 hour self.cache_ttl = 60 # 60 seconds def get_conversation_history( self, session_id: str, max_messages: int = 10 ) -> List[Dict]: """Obtiene historial de conversacion""" key = f"llm:history:{session_id}" history = self.redis.lrange(key, -max_messages, -1) return [json.loads(h) for h in history] def add_to_history( self, session_id: str, user_message: str, assistant_response: str ): """Agrega mensaje al historial""" key = f"llm:history:{session_id}" self.redis.rpush(key, json.dumps({ "role": "user", "content": user_message })) self.redis.rpush(key, json.dumps({ "role": "assistant", "content": assistant_response })) # Mantener solo ultimos N mensajes self.redis.ltrim(key, -20, -1) self.redis.expire(key, self.history_ttl) def get_market_context(self, symbol: str) -> Optional[Dict]: """Obtiene contexto de mercado cacheado""" key = f"llm:market_context:{symbol}" data = self.redis.get(key) return json.loads(data) if data else None def set_market_context(self, symbol: str, context: Dict): """Cachea contexto de mercado""" key = f"llm:market_context:{symbol}" self.redis.setex(key, self.cache_ttl, json.dumps(context)) def get_user_preferences(self, user_id: str) -> Dict: """Obtiene preferencias del usuario""" key = f"llm:user_prefs:{user_id}" return self.redis.hgetall(key) or {} def set_user_preference(self, user_id: str, pref: str, value: str): """Establece preferencia del usuario""" key = f"llm:user_prefs:{user_id}" self.redis.hset(key, pref, value) ``` --- ## Implementacion ### Docker Compose **IMPORTANTE:** Los puertos deben seguir la politica definida en `/core/devtools/environment/DEVENV-PORTS.md` **Puertos asignados a trading-platform:** - Rango base: 3600 - Frontend: 5179 - Backend API: 3600 - Database: 5438 (o 5432 compartido) - Redis: 6385 - MinIO: 9600/9601 ```yaml # docker-compose.llm.yaml version: '3.8' services: ollama: image: ollama/ollama:latest container_name: trading-ollama ports: - "11434:11434" volumes: - ollama_data:/root/.ollama deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] restart: unless-stopped llm-service: build: context: . dockerfile: Dockerfile.llm container_name: trading-llm ports: - "3602:3602" # LLM service (base 3600 + 2) environment: - OLLAMA_URL=http://ollama:11434 - REDIS_URL=redis://redis:6385 - ML_ENGINE_URL=http://ml-engine:3601 - TRADING_URL=http://trading-service:3603 depends_on: - ollama - redis restart: unless-stopped redis: image: redis:7-alpine container_name: trading-redis ports: - "6385:6379" # Puerto asignado segun DEVENV-PORTS volumes: - redis_data:/data restart: unless-stopped volumes: ollama_data: redis_data: ``` ### Script de Inicializacion ```bash #!/bin/bash # scripts/init_llm.sh echo "=== Trading Platform LLM Setup ===" # 1. Check GPU echo "Checking GPU..." nvidia-smi # 2. Start Ollama echo "Starting Ollama..." ollama serve & sleep 5 # 3. Pull model echo "Pulling Llama3 model..." ollama pull llama3:8b-instruct-q5_K_M # 4. Test model echo "Testing model..." ollama run llama3:8b-instruct-q5_K_M "Hello, respond with OK if working" # 5. Start services echo "Starting LLM service..." docker-compose -f docker-compose.llm.yaml up -d echo "=== Setup Complete ===" echo "LLM Service: http://localhost:3602" echo "Ollama API: http://localhost:11434" ``` --- ## Testing y Validacion ### Test Cases ```python # tests/test_llm_service.py import pytest import httpx LLM_URL = "http://localhost:3602" # Puerto asignado segun DEVENV-PORTS (base 3600 + 2) @pytest.mark.asyncio async def test_health_check(): """Test health endpoint""" async with httpx.AsyncClient() as client: response = await client.get(f"{LLM_URL}/api/health") assert response.status_code == 200 assert response.json()["status"] == "healthy" @pytest.mark.asyncio async def test_simple_chat(): """Test basic chat""" async with httpx.AsyncClient(timeout=60.0) as client: response = await client.post( f"{LLM_URL}/api/chat", json={ "message": "Hola, como estas?", "session_id": "test_session" } ) assert response.status_code == 200 assert len(response.json()["response"]) > 0 @pytest.mark.asyncio async def test_get_signal(): """Test signal retrieval via LLM""" async with httpx.AsyncClient(timeout=60.0) as client: response = await client.post( f"{LLM_URL}/api/chat", json={ "message": "Dame la senal actual para XAUUSD", "session_id": "test_signal", "symbol": "XAUUSD" } ) assert response.status_code == 200 data = response.json() assert "get_ml_signal" in data["tools_used"] @pytest.mark.asyncio async def test_response_time(): """Test response time < 3s""" import time async with httpx.AsyncClient(timeout=60.0) as client: start = time.time() response = await client.post( f"{LLM_URL}/api/chat", json={ "message": "Analiza el mercado de XAUUSD", "session_id": "test_perf" } ) elapsed = time.time() - start assert response.status_code == 200 assert elapsed < 5.0 # 5s max (including tool calls) ``` ### Metricas de Validacion | Metrica | Target | Como Medir | |---------|--------|------------| | Response Time | <3s | pytest benchmark | | Tool Accuracy | >95% | Manual review | | Context Retention | 100% | Test history | | GPU Memory | <14GB | nvidia-smi | | Uptime | >99% | Monitoring | --- **Documento Generado:** 2025-12-08 **Trading Strategist - Trading Platform**