# Local LLM Agent - API Gateway API Gateway para Local LLM Agent, compatible con el estandar OpenAI. ## Quick Start ```bash # Instalar dependencias npm install # Desarrollo npm run start:dev # Produccion npm run build npm run start:prod ``` ## Endpoints ### OpenAI-Compatible | Endpoint | Metodo | Descripcion | |----------|--------|-------------| | `/v1/chat/completions` | POST | Chat completion | | `/v1/models` | GET | Listar modelos | ### Health | Endpoint | Metodo | Descripcion | |----------|--------|-------------| | `/health` | GET | Health check completo | | `/health/live` | GET | Liveness probe | | `/health/ready` | GET | Readiness probe | ### MCP Tools (Fase 2) | Endpoint | Metodo | Descripcion | |----------|--------|-------------| | `/mcp/tools` | GET | Listar herramientas | | `/mcp/tools/:name` | POST | Ejecutar herramienta | ## Configuracion Variables de entorno (ver `../../.env.example`): ```bash # Gateway GATEWAY_PORT=3160 # Inference Engine connection INFERENCE_HOST=localhost INFERENCE_PORT=3161 # Model MODEL_NAME=gpt-oss-20b # Tier Small TIER_SMALL_MAX_TOKENS=512 TIER_SMALL_MAX_CONTEXT=4096 TIER_SMALL_LATENCY_TARGET_MS=500 # Tier Main TIER_MAIN_MAX_TOKENS=2048 TIER_MAIN_MAX_CONTEXT=16384 TIER_MAIN_LATENCY_TARGET_MS=2000 ``` ## Arquitectura ``` ┌─────────────────────────────────────────────────────────┐ │ API Gateway (3160) │ │ │ │ ┌───────────────┐ ┌─────────────┐ ┌──────────────┐ │ │ │ OpenAI Compat │ │ Health │ │ MCP Tools │ │ │ │ Controller │ │ Controller │ │ Controller │ │ │ └───────┬───────┘ └─────────────┘ └──────────────┘ │ │ │ │ │ ┌───────┴───────┐ │ │ │ Router Service│ ← Tier classification │ │ └───────┬───────┘ │ │ │ │ └──────────┼──────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Inference Engine (3161) │ └─────────────────────────────────────────────────────────┘ ``` ## Ejemplo de Uso ### Chat Completion ```bash curl -X POST http://localhost:3160/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-oss-20b", "messages": [ {"role": "user", "content": "Hello!"} ], "max_tokens": 100 }' ``` ### Con SDK OpenAI (Python) ```python import openai client = openai.OpenAI( base_url="http://localhost:3160/v1", api_key="not-required" ) response = client.chat.completions.create( model="gpt-oss-20b", messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content) ``` ### Forzar Tier ```bash curl -X POST http://localhost:3160/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-oss-20b", "messages": [{"role": "user", "content": "Quick task"}], "x_tier": "small" }' ``` ## Swagger Documentacion interactiva disponible en: `http://localhost:3160/api` ## Desarrollo ```bash # Tests npm run test # Tests con cobertura npm run test:cov # Lint npm run lint # Format npm run format ``` ## Estructura ``` src/ ├── main.ts # Bootstrap ├── app.module.ts # Root module ├── health/ # Health checks │ ├── health.controller.ts │ ├── health.service.ts │ └── health.module.ts ├── openai-compat/ # OpenAI endpoints │ ├── openai-compat.controller.ts │ ├── openai-compat.service.ts │ ├── openai-compat.module.ts │ └── dto/ │ └── chat-completion.dto.ts ├── router/ # Tier routing │ ├── router.service.ts │ └── router.module.ts └── mcp/ # MCP Tools (Fase 2) ├── mcp.controller.ts ├── mcp.service.ts ├── mcp.module.ts └── dto/ └── mcp-tools.dto.ts ```