History

Adrian Flores Cortes 3def230d58 Initial commit: local-llm-agent infrastructure project Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>		2026-02-02 16:42:45 -06:00
..
src	Initial commit: local-llm-agent infrastructure project	2026-02-02 16:42:45 -06:00
test	Initial commit: local-llm-agent infrastructure project	2026-02-02 16:42:45 -06:00
.eslintrc.js	Initial commit: local-llm-agent infrastructure project	2026-02-02 16:42:45 -06:00
.prettierrc	Initial commit: local-llm-agent infrastructure project	2026-02-02 16:42:45 -06:00
Dockerfile	Initial commit: local-llm-agent infrastructure project	2026-02-02 16:42:45 -06:00
nest-cli.json	Initial commit: local-llm-agent infrastructure project	2026-02-02 16:42:45 -06:00
package-lock.json	Initial commit: local-llm-agent infrastructure project	2026-02-02 16:42:45 -06:00
package.json	Initial commit: local-llm-agent infrastructure project	2026-02-02 16:42:45 -06:00
README.md	Initial commit: local-llm-agent infrastructure project	2026-02-02 16:42:45 -06:00
tsconfig.json	Initial commit: local-llm-agent infrastructure project	2026-02-02 16:42:45 -06:00

README.md

Local LLM Agent - API Gateway

API Gateway para Local LLM Agent, compatible con el estandar OpenAI.

Quick Start

# Instalar dependencias
npm install

# Desarrollo
npm run start:dev

# Produccion
npm run build
npm run start:prod

Endpoints

OpenAI-Compatible

Endpoint	Metodo	Descripcion
`/v1/chat/completions`	POST	Chat completion
`/v1/models`	GET	Listar modelos

Health

Endpoint	Metodo	Descripcion
`/health`	GET	Health check completo
`/health/live`	GET	Liveness probe
`/health/ready`	GET	Readiness probe

MCP Tools (Fase 2)

Endpoint	Metodo	Descripcion
`/mcp/tools`	GET	Listar herramientas
`/mcp/tools/:name`	POST	Ejecutar herramienta

Configuracion

Variables de entorno (ver ../../.env.example):

# Gateway
GATEWAY_PORT=3160

# Inference Engine connection
INFERENCE_HOST=localhost
INFERENCE_PORT=3161

# Model
MODEL_NAME=gpt-oss-20b

# Tier Small
TIER_SMALL_MAX_TOKENS=512
TIER_SMALL_MAX_CONTEXT=4096
TIER_SMALL_LATENCY_TARGET_MS=500

# Tier Main
TIER_MAIN_MAX_TOKENS=2048
TIER_MAIN_MAX_CONTEXT=16384
TIER_MAIN_LATENCY_TARGET_MS=2000

Arquitectura

┌─────────────────────────────────────────────────────────┐
│                    API Gateway (3160)                    │
│                                                         │
│  ┌───────────────┐  ┌─────────────┐  ┌──────────────┐  │
│  │ OpenAI Compat │  │   Health    │  │  MCP Tools   │  │
│  │  Controller   │  │ Controller  │  │  Controller  │  │
│  └───────┬───────┘  └─────────────┘  └──────────────┘  │
│          │                                              │
│  ┌───────┴───────┐                                      │
│  │ Router Service│ ← Tier classification                │
│  └───────┬───────┘                                      │
│          │                                              │
└──────────┼──────────────────────────────────────────────┘
           │
           ▼
┌─────────────────────────────────────────────────────────┐
│             Inference Engine (3161)                      │
└─────────────────────────────────────────────────────────┘

Ejemplo de Uso

Chat Completion

curl -X POST http://localhost:3160/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss-20b",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "max_tokens": 100
  }'

Con SDK OpenAI (Python)

import openai

client = openai.OpenAI(
    base_url="http://localhost:3160/v1",
    api_key="not-required"
)

response = client.chat.completions.create(
    model="gpt-oss-20b",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

Forzar Tier

curl -X POST http://localhost:3160/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss-20b",
    "messages": [{"role": "user", "content": "Quick task"}],
    "x_tier": "small"
  }'

Swagger

Documentacion interactiva disponible en: http://localhost:3160/api

Desarrollo

# Tests
npm run test

# Tests con cobertura
npm run test:cov

# Lint
npm run lint

# Format
npm run format

Estructura

src/
├── main.ts                 # Bootstrap
├── app.module.ts           # Root module
├── health/                 # Health checks
│   ├── health.controller.ts
│   ├── health.service.ts
│   └── health.module.ts
├── openai-compat/          # OpenAI endpoints
│   ├── openai-compat.controller.ts
│   ├── openai-compat.service.ts
│   ├── openai-compat.module.ts
│   └── dto/
│       └── chat-completion.dto.ts
├── router/                 # Tier routing
│   ├── router.service.ts
│   └── router.module.ts
└── mcp/                    # MCP Tools (Fase 2)
    ├── mcp.controller.ts
    ├── mcp.service.ts
    ├── mcp.module.ts
    └── dto/
        └── mcp-tools.dto.ts