504 lines
9.9 KiB
Markdown
504 lines
9.9 KiB
Markdown
# API Reference - Local LLM Agent
|
|
|
|
**Version:** 1.0.0
|
|
**Base URL:** `http://localhost:3160`
|
|
**Fecha:** 2026-01-20
|
|
|
|
---
|
|
|
|
## 1. OVERVIEW
|
|
|
|
Local LLM Agent expone una API REST compatible con el estandar OpenAI para integracion transparente con clientes existentes.
|
|
|
|
### 1.1 Base URLs
|
|
|
|
| Servicio | URL | Descripcion |
|
|
|----------|-----|-------------|
|
|
| API Gateway | `http://localhost:3160` | Punto de entrada principal |
|
|
| Inference Engine | `http://localhost:3161` | Backend (solo red interna) |
|
|
| Ollama | `http://localhost:11434` | Runtime (solo host) |
|
|
|
|
### 1.2 Content-Type
|
|
|
|
Todas las requests deben usar:
|
|
```
|
|
Content-Type: application/json
|
|
```
|
|
|
|
### 1.3 Autenticacion
|
|
|
|
**MVP:** Sin autenticacion requerida (red local confiable)
|
|
|
|
**Fase 2:** Header `X-API-Key` opcional
|
|
|
|
---
|
|
|
|
## 2. ENDPOINTS
|
|
|
|
### 2.1 Chat Completions
|
|
|
|
#### POST /v1/chat/completions
|
|
|
|
Crea una respuesta de chat basada en los mensajes proporcionados.
|
|
|
|
**Request:**
|
|
|
|
```http
|
|
POST /v1/chat/completions HTTP/1.1
|
|
Host: localhost:3160
|
|
Content-Type: application/json
|
|
|
|
```
|
|
|
|
**Request Parameters:**
|
|
|
|
| Parametro | Tipo | Requerido | Default | Descripcion |
|
|
|-----------|------|-----------|---------|-------------|
|
|
| model | string | Si | - | ID del modelo a usar |
|
|
| messages | array | Si | - | Lista de mensajes |
|
|
| max_tokens | integer | No | 512 | Maximo tokens a generar |
|
|
| temperature | number | No | 0.7 | Temperatura (0.0-2.0) |
|
|
| top_p | number | No | 0.9 | Top-p sampling (0.0-1.0) |
|
|
| stream | boolean | No | false | Streaming (no soportado MVP) |
|
|
|
|
**Message Object:**
|
|
|
|
| Campo | Tipo | Requerido | Descripcion |
|
|
|-------|------|-----------|-------------|
|
|
| role | string | Si | "system", "user", o "assistant" |
|
|
| content | string | Si | Contenido del mensaje |
|
|
| name | string | No | Nombre del emisor |
|
|
|
|
**Response (200 OK):**
|
|
|
|
```json
|
|
{
|
|
"id": "chatcmpl-abc123",
|
|
"object": "chat.completion",
|
|
"created": 1706000000,
|
|
"model": "gpt-oss-20b",
|
|
"choices": [
|
|
{
|
|
"index": 0,
|
|
"message": {
|
|
"role": "assistant",
|
|
"content": "Hello! How can I help you today?"
|
|
},
|
|
"finish_reason": "stop"
|
|
}
|
|
],
|
|
"usage": {
|
|
"prompt_tokens": 25,
|
|
"completion_tokens": 10,
|
|
"total_tokens": 35
|
|
}
|
|
}
|
|
```
|
|
|
|
**Response Fields:**
|
|
|
|
| Campo | Tipo | Descripcion |
|
|
|-------|------|-------------|
|
|
| id | string | ID unico de la completion |
|
|
| object | string | Siempre "chat.completion" |
|
|
| created | integer | Unix timestamp |
|
|
| model | string | Modelo usado |
|
|
| choices | array | Lista de respuestas generadas |
|
|
| choices[].index | integer | Indice de la opcion |
|
|
| choices[].message | object | Mensaje generado |
|
|
| choices[].finish_reason | string | "stop" o "length" |
|
|
| usage | object | Estadisticas de tokens |
|
|
|
|
**Errores:**
|
|
|
|
| Status | Code | Descripcion |
|
|
|--------|------|-------------|
|
|
| 400 | invalid_request | Request mal formada |
|
|
| 404 | model_not_found | Modelo no disponible |
|
|
| 413 | context_length_exceeded | Contexto muy largo |
|
|
| 500 | internal_error | Error interno |
|
|
| 503 | backend_unavailable | Backend no disponible |
|
|
| 504 | inference_timeout | Timeout de inferencia |
|
|
|
|
---
|
|
|
|
### 2.2 Models
|
|
|
|
#### GET /v1/models
|
|
|
|
Lista los modelos disponibles.
|
|
|
|
**Request:**
|
|
|
|
```http
|
|
GET /v1/models HTTP/1.1
|
|
Host: localhost:3160
|
|
```
|
|
|
|
**Response (200 OK):**
|
|
|
|
```json
|
|
{
|
|
"object": "list",
|
|
"data": [
|
|
{
|
|
"id": "gpt-oss-20b",
|
|
"object": "model",
|
|
"created": 1706000000,
|
|
"owned_by": "ollama"
|
|
},
|
|
{
|
|
"id": "gpt-oss-20b:erp-core",
|
|
"object": "model",
|
|
"created": 1706000000,
|
|
"owned_by": "ollama"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Response Fields:**
|
|
|
|
| Campo | Tipo | Descripcion |
|
|
|-------|------|-------------|
|
|
| object | string | Siempre "list" |
|
|
| data | array | Lista de modelos |
|
|
| data[].id | string | ID del modelo |
|
|
| data[].object | string | Siempre "model" |
|
|
| data[].created | integer | Unix timestamp |
|
|
| data[].owned_by | string | Propietario ("ollama") |
|
|
|
|
---
|
|
|
|
### 2.3 Health
|
|
|
|
#### GET /health
|
|
|
|
Verifica el estado del servicio.
|
|
|
|
**Request:**
|
|
|
|
```http
|
|
GET /health HTTP/1.1
|
|
Host: localhost:3160
|
|
```
|
|
|
|
**Response (200 OK - Healthy):**
|
|
|
|
```json
|
|
{
|
|
"status": "healthy",
|
|
"timestamp": "2026-01-20T10:30:00.000Z",
|
|
"version": "0.1.0",
|
|
"dependencies": {
|
|
"inference_engine": "up",
|
|
"ollama": "up"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Response (503 Service Unavailable - Unhealthy):**
|
|
|
|
```json
|
|
{
|
|
"status": "unhealthy",
|
|
"timestamp": "2026-01-20T10:30:00.000Z",
|
|
"version": "0.1.0",
|
|
"dependencies": {
|
|
"inference_engine": "up",
|
|
"ollama": "down"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Status Values:**
|
|
|
|
| Status | Descripcion |
|
|
|--------|-------------|
|
|
| healthy | Todos los componentes operativos |
|
|
| degraded | Algunos componentes con problemas |
|
|
| unhealthy | Servicio no operativo |
|
|
|
|
---
|
|
|
|
### 2.4 MCP Tools (Fase 2)
|
|
|
|
#### GET /mcp/tools
|
|
|
|
Lista las herramientas MCP disponibles.
|
|
|
|
**Request:**
|
|
|
|
```http
|
|
GET /mcp/tools HTTP/1.1
|
|
Host: localhost:3160
|
|
```
|
|
|
|
**Response (200 OK):**
|
|
|
|
```json
|
|
{
|
|
"tools": [
|
|
{
|
|
"name": "classify",
|
|
"description": "Classify text into predefined categories",
|
|
"version": "1.0.0",
|
|
"parameters": {
|
|
"type": "object",
|
|
"properties": {
|
|
"text": {
|
|
"type": "string",
|
|
"description": "Text to classify"
|
|
},
|
|
"categories": {
|
|
"type": "array",
|
|
"items": {"type": "string"},
|
|
"description": "Possible categories"
|
|
}
|
|
},
|
|
"required": ["text", "categories"]
|
|
}
|
|
},
|
|
{
|
|
"name": "extract",
|
|
"description": "Extract structured data from text",
|
|
"version": "1.0.0",
|
|
"parameters": {...}
|
|
},
|
|
{
|
|
"name": "summarize",
|
|
"description": "Summarize text",
|
|
"version": "1.0.0",
|
|
"parameters": {...}
|
|
},
|
|
{
|
|
"name": "rewrite",
|
|
"description": "Rewrite text with specific style",
|
|
"version": "1.0.0",
|
|
"parameters": {...}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
#### POST /mcp/tools/:name
|
|
|
|
Ejecuta una herramienta MCP especifica.
|
|
|
|
**Request:**
|
|
|
|
```http
|
|
POST /mcp/tools/classify HTTP/1.1
|
|
Host: localhost:3160
|
|
Content-Type: application/json
|
|
|
|
```
|
|
|
|
**Response (200 OK):**
|
|
|
|
```json
|
|
{
|
|
"category": "bug",
|
|
"confidence": 0.92,
|
|
"reasoning": "The text mentions 'bug' and describes a problem with functionality"
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 3. ERROR RESPONSES
|
|
|
|
### 3.1 Error Format
|
|
|
|
Todas las respuestas de error siguen el formato OpenAI:
|
|
|
|
```json
|
|
{
|
|
"error": {
|
|
"code": "error_code",
|
|
"message": "Human readable message",
|
|
"type": "error_type",
|
|
"param": "parameter_name"
|
|
}
|
|
}
|
|
```
|
|
|
|
### 3.2 Error Types
|
|
|
|
| Type | Descripcion |
|
|
|------|-------------|
|
|
| invalid_request_error | Request mal formada o parametros invalidos |
|
|
| authentication_error | Autenticacion fallida (Fase 2) |
|
|
| rate_limit_error | Rate limit excedido (Fase 2) |
|
|
| server_error | Error interno del servidor |
|
|
|
|
### 3.3 Error Codes
|
|
|
|
| Code | HTTP Status | Descripcion |
|
|
|------|-------------|-------------|
|
|
| invalid_request | 400 | Request invalida |
|
|
| model_not_found | 404 | Modelo no existe |
|
|
| context_length_exceeded | 413 | Contexto muy largo |
|
|
| rate_limited | 429 | Rate limit |
|
|
| backend_unavailable | 503 | Backend no disponible |
|
|
| inference_timeout | 504 | Timeout |
|
|
| internal_error | 500 | Error interno |
|
|
|
|
---
|
|
|
|
## 4. RATE LIMITS (Fase 2)
|
|
|
|
| Tier | Requests/min | Tokens/min |
|
|
|------|--------------|------------|
|
|
| small | 40 | 20000 |
|
|
| main | 10 | 50000 |
|
|
|
|
---
|
|
|
|
## 5. HEADERS
|
|
|
|
### 5.1 Request Headers
|
|
|
|
| Header | Descripcion | Requerido |
|
|
|--------|-------------|-----------|
|
|
| Content-Type | application/json | Si |
|
|
| X-API-Key | API key (Fase 2) | No |
|
|
| X-Tier | Tier forzado (small/main) | No |
|
|
| X-Request-ID | ID para tracking | No |
|
|
|
|
### 5.2 Response Headers
|
|
|
|
| Header | Descripcion |
|
|
|--------|-------------|
|
|
| X-Request-ID | ID de la request (generado si no se proporciona) |
|
|
| X-Latency-Ms | Latencia de procesamiento |
|
|
| X-Tier | Tier usado para la request |
|
|
|
|
---
|
|
|
|
## 6. EJEMPLOS DE USO
|
|
|
|
### 6.1 Python (OpenAI SDK)
|
|
|
|
```python
|
|
import openai
|
|
|
|
client = openai.OpenAI(
|
|
base_url="http://localhost:3160/v1",
|
|
api_key="not-required" # MVP no requiere API key
|
|
)
|
|
|
|
response = client.chat.completions.create(
|
|
model="gpt-oss-20b",
|
|
messages=[
|
|
{"role": "system", "content": "You are a helpful assistant."},
|
|
{"role": "user", "content": "What is 2+2?"}
|
|
],
|
|
max_tokens=100
|
|
)
|
|
|
|
print(response.choices[0].message.content)
|
|
```
|
|
|
|
### 6.2 JavaScript (OpenAI SDK)
|
|
|
|
```javascript
|
|
import OpenAI from 'openai';
|
|
|
|
const client = new OpenAI({
|
|
baseURL: 'http://localhost:3160/v1',
|
|
apiKey: 'not-required'
|
|
});
|
|
|
|
const response = await client.chat.completions.create({
|
|
model: 'gpt-oss-20b',
|
|
messages: [
|
|
{ role: 'system', content: 'You are a helpful assistant.' },
|
|
{ role: 'user', content: 'What is 2+2?' }
|
|
],
|
|
max_tokens: 100
|
|
});
|
|
|
|
console.log(response.choices[0].message.content);
|
|
```
|
|
|
|
### 6.3 cURL
|
|
|
|
```bash
|
|
curl -X POST http://localhost:3160/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "gpt-oss-20b",
|
|
"messages": [
|
|
{"role": "user", "content": "Hello!"}
|
|
],
|
|
"max_tokens": 100
|
|
}'
|
|
```
|
|
|
|
### 6.4 Health Check
|
|
|
|
```bash
|
|
curl http://localhost:3160/health
|
|
```
|
|
|
|
---
|
|
|
|
## 7. INTEGRACION CON AGENTES
|
|
|
|
### 7.1 Claude Code
|
|
|
|
```yaml
|
|
# Configuracion en hook de Claude Code
|
|
mcp_servers:
|
|
local-llm:
|
|
url: http://localhost:3160
|
|
capabilities:
|
|
- chat
|
|
- classify
|
|
- extract
|
|
- summarize
|
|
```
|
|
|
|
### 7.2 Trae IDE
|
|
|
|
```json
|
|
{
|
|
"llm": {
|
|
"provider": "openai-compatible",
|
|
"baseUrl": "http://localhost:3160/v1",
|
|
"model": "gpt-oss-20b"
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 8. CHANGELOG
|
|
|
|
| Version | Fecha | Cambios |
|
|
|---------|-------|---------|
|
|
| 1.0.0 | 2026-01-20 | Version inicial |
|
|
|
|
---
|
|
|
|
**Documento Controlado**
|
|
- Autor: Requirements-Analyst Agent
|
|
- Fecha: 2026-01-20
|