183 lines
4.9 KiB
Markdown
183 lines
4.9 KiB
Markdown
# Local LLM Agent - API Gateway
|
|
|
|
API Gateway para Local LLM Agent, compatible con el estandar OpenAI.
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Instalar dependencias
|
|
npm install
|
|
|
|
# Desarrollo
|
|
npm run start:dev
|
|
|
|
# Produccion
|
|
npm run build
|
|
npm run start:prod
|
|
```
|
|
|
|
## Endpoints
|
|
|
|
### OpenAI-Compatible
|
|
|
|
| Endpoint | Metodo | Descripcion |
|
|
|----------|--------|-------------|
|
|
| `/v1/chat/completions` | POST | Chat completion |
|
|
| `/v1/models` | GET | Listar modelos |
|
|
|
|
### Health
|
|
|
|
| Endpoint | Metodo | Descripcion |
|
|
|----------|--------|-------------|
|
|
| `/health` | GET | Health check completo |
|
|
| `/health/live` | GET | Liveness probe |
|
|
| `/health/ready` | GET | Readiness probe |
|
|
|
|
### MCP Tools (Fase 2)
|
|
|
|
| Endpoint | Metodo | Descripcion |
|
|
|----------|--------|-------------|
|
|
| `/mcp/tools` | GET | Listar herramientas |
|
|
| `/mcp/tools/:name` | POST | Ejecutar herramienta |
|
|
|
|
## Configuracion
|
|
|
|
Variables de entorno (ver `../../.env.example`):
|
|
|
|
```bash
|
|
# Gateway
|
|
GATEWAY_PORT=3160
|
|
|
|
# Inference Engine connection
|
|
INFERENCE_HOST=localhost
|
|
INFERENCE_PORT=3161
|
|
|
|
# Model
|
|
MODEL_NAME=gpt-oss-20b
|
|
|
|
# Tier Small
|
|
TIER_SMALL_MAX_TOKENS=512
|
|
TIER_SMALL_MAX_CONTEXT=4096
|
|
TIER_SMALL_LATENCY_TARGET_MS=500
|
|
|
|
# Tier Main
|
|
TIER_MAIN_MAX_TOKENS=2048
|
|
TIER_MAIN_MAX_CONTEXT=16384
|
|
TIER_MAIN_LATENCY_TARGET_MS=2000
|
|
```
|
|
|
|
## Arquitectura
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ API Gateway (3160) │
|
|
│ │
|
|
│ ┌───────────────┐ ┌─────────────┐ ┌──────────────┐ │
|
|
│ │ OpenAI Compat │ │ Health │ │ MCP Tools │ │
|
|
│ │ Controller │ │ Controller │ │ Controller │ │
|
|
│ └───────┬───────┘ └─────────────┘ └──────────────┘ │
|
|
│ │ │
|
|
│ ┌───────┴───────┐ │
|
|
│ │ Router Service│ ← Tier classification │
|
|
│ └───────┬───────┘ │
|
|
│ │ │
|
|
└──────────┼──────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ Inference Engine (3161) │
|
|
└─────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Ejemplo de Uso
|
|
|
|
### Chat Completion
|
|
|
|
```bash
|
|
curl -X POST http://localhost:3160/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "gpt-oss-20b",
|
|
"messages": [
|
|
{"role": "user", "content": "Hello!"}
|
|
],
|
|
"max_tokens": 100
|
|
}'
|
|
```
|
|
|
|
### Con SDK OpenAI (Python)
|
|
|
|
```python
|
|
import openai
|
|
|
|
client = openai.OpenAI(
|
|
base_url="http://localhost:3160/v1",
|
|
api_key="not-required"
|
|
)
|
|
|
|
response = client.chat.completions.create(
|
|
model="gpt-oss-20b",
|
|
messages=[{"role": "user", "content": "Hello!"}]
|
|
)
|
|
print(response.choices[0].message.content)
|
|
```
|
|
|
|
### Forzar Tier
|
|
|
|
```bash
|
|
curl -X POST http://localhost:3160/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "gpt-oss-20b",
|
|
"messages": [{"role": "user", "content": "Quick task"}],
|
|
"x_tier": "small"
|
|
}'
|
|
```
|
|
|
|
## Swagger
|
|
|
|
Documentacion interactiva disponible en: `http://localhost:3160/api`
|
|
|
|
## Desarrollo
|
|
|
|
```bash
|
|
# Tests
|
|
npm run test
|
|
|
|
# Tests con cobertura
|
|
npm run test:cov
|
|
|
|
# Lint
|
|
npm run lint
|
|
|
|
# Format
|
|
npm run format
|
|
```
|
|
|
|
## Estructura
|
|
|
|
```
|
|
src/
|
|
├── main.ts # Bootstrap
|
|
├── app.module.ts # Root module
|
|
├── health/ # Health checks
|
|
│ ├── health.controller.ts
|
|
│ ├── health.service.ts
|
|
│ └── health.module.ts
|
|
├── openai-compat/ # OpenAI endpoints
|
|
│ ├── openai-compat.controller.ts
|
|
│ ├── openai-compat.service.ts
|
|
│ ├── openai-compat.module.ts
|
|
│ └── dto/
|
|
│ └── chat-completion.dto.ts
|
|
├── router/ # Tier routing
|
|
│ ├── router.service.ts
|
|
│ └── router.module.ts
|
|
└── mcp/ # MCP Tools (Fase 2)
|
|
├── mcp.controller.ts
|
|
├── mcp.service.ts
|
|
├── mcp.module.ts
|
|
└── dto/
|
|
└── mcp-tools.dto.ts
|
|
```
|