Local LLM Agent - API Gateway
API Gateway para Local LLM Agent, compatible con el estandar OpenAI.
Quick Start
# Instalar dependencias
npm install
# Desarrollo
npm run start:dev
# Produccion
npm run build
npm run start:prod
Endpoints
OpenAI-Compatible
| Endpoint |
Metodo |
Descripcion |
/v1/chat/completions |
POST |
Chat completion |
/v1/models |
GET |
Listar modelos |
Health
| Endpoint |
Metodo |
Descripcion |
/health |
GET |
Health check completo |
/health/live |
GET |
Liveness probe |
/health/ready |
GET |
Readiness probe |
MCP Tools (Fase 2)
| Endpoint |
Metodo |
Descripcion |
/mcp/tools |
GET |
Listar herramientas |
/mcp/tools/:name |
POST |
Ejecutar herramienta |
Configuracion
Variables de entorno (ver ../../.env.example):
# Gateway
GATEWAY_PORT=3160
# Inference Engine connection
INFERENCE_HOST=localhost
INFERENCE_PORT=3161
# Model
MODEL_NAME=gpt-oss-20b
# Tier Small
TIER_SMALL_MAX_TOKENS=512
TIER_SMALL_MAX_CONTEXT=4096
TIER_SMALL_LATENCY_TARGET_MS=500
# Tier Main
TIER_MAIN_MAX_TOKENS=2048
TIER_MAIN_MAX_CONTEXT=16384
TIER_MAIN_LATENCY_TARGET_MS=2000
Arquitectura
┌─────────────────────────────────────────────────────────┐
│ API Gateway (3160) │
│ │
│ ┌───────────────┐ ┌─────────────┐ ┌──────────────┐ │
│ │ OpenAI Compat │ │ Health │ │ MCP Tools │ │
│ │ Controller │ │ Controller │ │ Controller │ │
│ └───────┬───────┘ └─────────────┘ └──────────────┘ │
│ │ │
│ ┌───────┴───────┐ │
│ │ Router Service│ ← Tier classification │
│ └───────┬───────┘ │
│ │ │
└──────────┼──────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Inference Engine (3161) │
└─────────────────────────────────────────────────────────┘
Ejemplo de Uso
Chat Completion
curl -X POST http://localhost:3160/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-oss-20b",
"messages": [
{"role": "user", "content": "Hello!"}
],
"max_tokens": 100
}'
Con SDK OpenAI (Python)
import openai
client = openai.OpenAI(
base_url="http://localhost:3160/v1",
api_key="not-required"
)
response = client.chat.completions.create(
model="gpt-oss-20b",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
Forzar Tier
curl -X POST http://localhost:3160/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-oss-20b",
"messages": [{"role": "user", "content": "Quick task"}],
"x_tier": "small"
}'
Swagger
Documentacion interactiva disponible en: http://localhost:3160/api
Desarrollo
# Tests
npm run test
# Tests con cobertura
npm run test:cov
# Lint
npm run lint
# Format
npm run format
Estructura
src/
├── main.ts # Bootstrap
├── app.module.ts # Root module
├── health/ # Health checks
│ ├── health.controller.ts
│ ├── health.service.ts
│ └── health.module.ts
├── openai-compat/ # OpenAI endpoints
│ ├── openai-compat.controller.ts
│ ├── openai-compat.service.ts
│ ├── openai-compat.module.ts
│ └── dto/
│ └── chat-completion.dto.ts
├── router/ # Tier routing
│ ├── router.service.ts
│ └── router.module.ts
└── mcp/ # MCP Tools (Fase 2)
├── mcp.controller.ts
├── mcp.service.ts
├── mcp.module.ts
└── dto/
└── mcp-tools.dto.ts