local-llm-agent/apps/gateway/README.md
Adrian Flores Cortes 3def230d58 Initial commit: local-llm-agent infrastructure project
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 16:42:45 -06:00

183 lines
4.9 KiB
Markdown

# Local LLM Agent - API Gateway
API Gateway para Local LLM Agent, compatible con el estandar OpenAI.
## Quick Start
```bash
# Instalar dependencias
npm install
# Desarrollo
npm run start:dev
# Produccion
npm run build
npm run start:prod
```
## Endpoints
### OpenAI-Compatible
| Endpoint | Metodo | Descripcion |
|----------|--------|-------------|
| `/v1/chat/completions` | POST | Chat completion |
| `/v1/models` | GET | Listar modelos |
### Health
| Endpoint | Metodo | Descripcion |
|----------|--------|-------------|
| `/health` | GET | Health check completo |
| `/health/live` | GET | Liveness probe |
| `/health/ready` | GET | Readiness probe |
### MCP Tools (Fase 2)
| Endpoint | Metodo | Descripcion |
|----------|--------|-------------|
| `/mcp/tools` | GET | Listar herramientas |
| `/mcp/tools/:name` | POST | Ejecutar herramienta |
## Configuracion
Variables de entorno (ver `../../.env.example`):
```bash
# Gateway
GATEWAY_PORT=3160
# Inference Engine connection
INFERENCE_HOST=localhost
INFERENCE_PORT=3161
# Model
MODEL_NAME=gpt-oss-20b
# Tier Small
TIER_SMALL_MAX_TOKENS=512
TIER_SMALL_MAX_CONTEXT=4096
TIER_SMALL_LATENCY_TARGET_MS=500
# Tier Main
TIER_MAIN_MAX_TOKENS=2048
TIER_MAIN_MAX_CONTEXT=16384
TIER_MAIN_LATENCY_TARGET_MS=2000
```
## Arquitectura
```
┌─────────────────────────────────────────────────────────┐
│ API Gateway (3160) │
│ │
│ ┌───────────────┐ ┌─────────────┐ ┌──────────────┐ │
│ │ OpenAI Compat │ │ Health │ │ MCP Tools │ │
│ │ Controller │ │ Controller │ │ Controller │ │
│ └───────┬───────┘ └─────────────┘ └──────────────┘ │
│ │ │
│ ┌───────┴───────┐ │
│ │ Router Service│ ← Tier classification │
│ └───────┬───────┘ │
│ │ │
└──────────┼──────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ Inference Engine (3161) │
└─────────────────────────────────────────────────────────┘
```
## Ejemplo de Uso
### Chat Completion
```bash
curl -X POST http://localhost:3160/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-oss-20b",
"messages": [
{"role": "user", "content": "Hello!"}
],
"max_tokens": 100
}'
```
### Con SDK OpenAI (Python)
```python
import openai
client = openai.OpenAI(
base_url="http://localhost:3160/v1",
api_key="not-required"
)
response = client.chat.completions.create(
model="gpt-oss-20b",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
```
### Forzar Tier
```bash
curl -X POST http://localhost:3160/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-oss-20b",
"messages": [{"role": "user", "content": "Quick task"}],
"x_tier": "small"
}'
```
## Swagger
Documentacion interactiva disponible en: `http://localhost:3160/api`
## Desarrollo
```bash
# Tests
npm run test
# Tests con cobertura
npm run test:cov
# Lint
npm run lint
# Format
npm run format
```
## Estructura
```
src/
├── main.ts # Bootstrap
├── app.module.ts # Root module
├── health/ # Health checks
│ ├── health.controller.ts
│ ├── health.service.ts
│ └── health.module.ts
├── openai-compat/ # OpenAI endpoints
│ ├── openai-compat.controller.ts
│ ├── openai-compat.service.ts
│ ├── openai-compat.module.ts
│ └── dto/
│ └── chat-completion.dto.ts
├── router/ # Tier routing
│ ├── router.service.ts
│ └── router.module.ts
└── mcp/ # MCP Tools (Fase 2)
├── mcp.controller.ts
├── mcp.service.ts
├── mcp.module.ts
└── dto/
└── mcp-tools.dto.ts
```