local-llm-agent/apps/gateway/README.md

# Local LLM Agent - API Gateway

API Gateway para Local LLM Agent, compatible con el estandar OpenAI.

## Quick Start

```bash
# Instalar dependencias
npm install

# Desarrollo
npm run start:dev

# Produccion
npm run build
npm run start:prod
```

## Endpoints

### OpenAI-Compatible

| Endpoint | Metodo | Descripcion |
|----------|--------|-------------|
| `/v1/chat/completions` | POST | Chat completion |
| `/v1/models` | GET | Listar modelos |

### Health

| Endpoint | Metodo | Descripcion |
|----------|--------|-------------|
| `/health` | GET | Health check completo |
| `/health/live` | GET | Liveness probe |
| `/health/ready` | GET | Readiness probe |

### MCP Tools (Fase 2)

| Endpoint | Metodo | Descripcion |
|----------|--------|-------------|
| `/mcp/tools` | GET | Listar herramientas |
| `/mcp/tools/:name` | POST | Ejecutar herramienta |

## Configuracion

Variables de entorno (ver `../../.env.example`):

```bash
# Gateway
GATEWAY_PORT=3160

# Inference Engine connection
INFERENCE_HOST=localhost
INFERENCE_PORT=3161

# Model
MODEL_NAME=gpt-oss-20b

# Tier Small
TIER_SMALL_MAX_TOKENS=512
TIER_SMALL_MAX_CONTEXT=4096
TIER_SMALL_LATENCY_TARGET_MS=500

# Tier Main
TIER_MAIN_MAX_TOKENS=2048
TIER_MAIN_MAX_CONTEXT=16384
TIER_MAIN_LATENCY_TARGET_MS=2000
```

## Arquitectura

```
┌─────────────────────────────────────────────────────────┐
│                    API Gateway (3160)                    │
│                                                         │
│  ┌───────────────┐  ┌─────────────┐  ┌──────────────┐  │
│  │ OpenAI Compat │  │   Health    │  │  MCP Tools   │  │
│  │  Controller   │  │ Controller  │  │  Controller  │  │
│  └───────┬───────┘  └─────────────┘  └──────────────┘  │
│          │                                              │
│  ┌───────┴───────┐                                      │
│  │ Router Service│ ← Tier classification                │
│  └───────┬───────┘                                      │
│          │                                              │
└──────────┼──────────────────────────────────────────────┘
           │
           ▼
┌─────────────────────────────────────────────────────────┐
│             Inference Engine (3161)                      │
└─────────────────────────────────────────────────────────┘
```

## Ejemplo de Uso

### Chat Completion

```bash
curl -X POST http://localhost:3160/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss-20b",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "max_tokens": 100
  }'
```

### Con SDK OpenAI (Python)

```python
import openai

client = openai.OpenAI(
    base_url="http://localhost:3160/v1",
    api_key="not-required"
)

response = client.chat.completions.create(
    model="gpt-oss-20b",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
```

### Forzar Tier

```bash
curl -X POST http://localhost:3160/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss-20b",
    "messages": [{"role": "user", "content": "Quick task"}],
    "x_tier": "small"
  }'
```

## Swagger

Documentacion interactiva disponible en: `http://localhost:3160/api`

## Desarrollo

```bash
# Tests
npm run test

# Tests con cobertura
npm run test:cov

# Lint
npm run lint

# Format
npm run format
```

## Estructura

```
src/
├── main.ts                 # Bootstrap
├── app.module.ts           # Root module
├── health/                 # Health checks
│   ├── health.controller.ts
│   ├── health.service.ts
│   └── health.module.ts
├── openai-compat/          # OpenAI endpoints
│   ├── openai-compat.controller.ts
│   ├── openai-compat.service.ts
│   ├── openai-compat.module.ts
│   └── dto/
│       └── chat-completion.dto.ts
├── router/                 # Tier routing
│   ├── router.service.ts
│   └── router.module.ts
└── mcp/                    # MCP Tools (Fase 2)
    ├── mcp.controller.ts
    ├── mcp.service.ts
    ├── mcp.module.ts
    └── dto/
        └── mcp-tools.dto.ts
```