local-llm-agent/docs/80-referencias/API-REFERENCE.md
Adrian Flores Cortes 3def230d58 Initial commit: local-llm-agent infrastructure project
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 16:42:45 -06:00

9.9 KiB

API Reference - Local LLM Agent

Version: 1.0.0 Base URL: http://localhost:3160 Fecha: 2026-01-20


1. OVERVIEW

Local LLM Agent expone una API REST compatible con el estandar OpenAI para integracion transparente con clientes existentes.

1.1 Base URLs

Servicio URL Descripcion
API Gateway http://localhost:3160 Punto de entrada principal
Inference Engine http://localhost:3161 Backend (solo red interna)
Ollama http://localhost:11434 Runtime (solo host)

1.2 Content-Type

Todas las requests deben usar:

Content-Type: application/json

1.3 Autenticacion

MVP: Sin autenticacion requerida (red local confiable)

Fase 2: Header X-API-Key opcional


2. ENDPOINTS

2.1 Chat Completions

POST /v1/chat/completions

Crea una respuesta de chat basada en los mensajes proporcionados.

Request:

POST /v1/chat/completions HTTP/1.1
Host: localhost:3160
Content-Type: application/json

{
  "model": "gpt-oss-20b",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  "max_tokens": 512,
  "temperature": 0.7,
  "top_p": 0.9
}

Request Parameters:

Parametro Tipo Requerido Default Descripcion
model string Si - ID del modelo a usar
messages array Si - Lista de mensajes
max_tokens integer No 512 Maximo tokens a generar
temperature number No 0.7 Temperatura (0.0-2.0)
top_p number No 0.9 Top-p sampling (0.0-1.0)
stream boolean No false Streaming (no soportado MVP)

Message Object:

Campo Tipo Requerido Descripcion
role string Si "system", "user", o "assistant"
content string Si Contenido del mensaje
name string No Nombre del emisor

Response (200 OK):

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1706000000,
  "model": "gpt-oss-20b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 10,
    "total_tokens": 35
  }
}

Response Fields:

Campo Tipo Descripcion
id string ID unico de la completion
object string Siempre "chat.completion"
created integer Unix timestamp
model string Modelo usado
choices array Lista de respuestas generadas
choices[].index integer Indice de la opcion
choices[].message object Mensaje generado
choices[].finish_reason string "stop" o "length"
usage object Estadisticas de tokens

Errores:

Status Code Descripcion
400 invalid_request Request mal formada
404 model_not_found Modelo no disponible
413 context_length_exceeded Contexto muy largo
500 internal_error Error interno
503 backend_unavailable Backend no disponible
504 inference_timeout Timeout de inferencia

2.2 Models

GET /v1/models

Lista los modelos disponibles.

Request:

GET /v1/models HTTP/1.1
Host: localhost:3160

Response (200 OK):

{
  "object": "list",
  "data": [
    {
      "id": "gpt-oss-20b",
      "object": "model",
      "created": 1706000000,
      "owned_by": "ollama"
    },
    {
      "id": "gpt-oss-20b:erp-core",
      "object": "model",
      "created": 1706000000,
      "owned_by": "ollama"
    }
  ]
}

Response Fields:

Campo Tipo Descripcion
object string Siempre "list"
data array Lista de modelos
data[].id string ID del modelo
data[].object string Siempre "model"
data[].created integer Unix timestamp
data[].owned_by string Propietario ("ollama")

2.3 Health

GET /health

Verifica el estado del servicio.

Request:

GET /health HTTP/1.1
Host: localhost:3160

Response (200 OK - Healthy):

{
  "status": "healthy",
  "timestamp": "2026-01-20T10:30:00.000Z",
  "version": "0.1.0",
  "dependencies": {
    "inference_engine": "up",
    "ollama": "up"
  }
}

Response (503 Service Unavailable - Unhealthy):

{
  "status": "unhealthy",
  "timestamp": "2026-01-20T10:30:00.000Z",
  "version": "0.1.0",
  "dependencies": {
    "inference_engine": "up",
    "ollama": "down"
  }
}

Status Values:

Status Descripcion
healthy Todos los componentes operativos
degraded Algunos componentes con problemas
unhealthy Servicio no operativo

2.4 MCP Tools (Fase 2)

GET /mcp/tools

Lista las herramientas MCP disponibles.

Request:

GET /mcp/tools HTTP/1.1
Host: localhost:3160

Response (200 OK):

{
  "tools": [
    {
      "name": "classify",
      "description": "Classify text into predefined categories",
      "version": "1.0.0",
      "parameters": {
        "type": "object",
        "properties": {
          "text": {
            "type": "string",
            "description": "Text to classify"
          },
          "categories": {
            "type": "array",
            "items": {"type": "string"},
            "description": "Possible categories"
          }
        },
        "required": ["text", "categories"]
      }
    },
    {
      "name": "extract",
      "description": "Extract structured data from text",
      "version": "1.0.0",
      "parameters": {...}
    },
    {
      "name": "summarize",
      "description": "Summarize text",
      "version": "1.0.0",
      "parameters": {...}
    },
    {
      "name": "rewrite",
      "description": "Rewrite text with specific style",
      "version": "1.0.0",
      "parameters": {...}
    }
  ]
}

POST /mcp/tools/:name

Ejecuta una herramienta MCP especifica.

Request:

POST /mcp/tools/classify HTTP/1.1
Host: localhost:3160
Content-Type: application/json

{
  "text": "The customer reported a bug in the login form",
  "categories": ["bug", "feature", "question", "documentation"]
}

Response (200 OK):

{
  "category": "bug",
  "confidence": 0.92,
  "reasoning": "The text mentions 'bug' and describes a problem with functionality"
}

3. ERROR RESPONSES

3.1 Error Format

Todas las respuestas de error siguen el formato OpenAI:

{
  "error": {
    "code": "error_code",
    "message": "Human readable message",
    "type": "error_type",
    "param": "parameter_name"
  }
}

3.2 Error Types

Type Descripcion
invalid_request_error Request mal formada o parametros invalidos
authentication_error Autenticacion fallida (Fase 2)
rate_limit_error Rate limit excedido (Fase 2)
server_error Error interno del servidor

3.3 Error Codes

Code HTTP Status Descripcion
invalid_request 400 Request invalida
model_not_found 404 Modelo no existe
context_length_exceeded 413 Contexto muy largo
rate_limited 429 Rate limit
backend_unavailable 503 Backend no disponible
inference_timeout 504 Timeout
internal_error 500 Error interno

4. RATE LIMITS (Fase 2)

Tier Requests/min Tokens/min
small 40 20000
main 10 50000

5. HEADERS

5.1 Request Headers

Header Descripcion Requerido
Content-Type application/json Si
X-API-Key API key (Fase 2) No
X-Tier Tier forzado (small/main) No
X-Request-ID ID para tracking No

5.2 Response Headers

Header Descripcion
X-Request-ID ID de la request (generado si no se proporciona)
X-Latency-Ms Latencia de procesamiento
X-Tier Tier usado para la request

6. EJEMPLOS DE USO

6.1 Python (OpenAI SDK)

import openai

client = openai.OpenAI(
    base_url="http://localhost:3160/v1",
    api_key="not-required"  # MVP no requiere API key
)

response = client.chat.completions.create(
    model="gpt-oss-20b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is 2+2?"}
    ],
    max_tokens=100
)

print(response.choices[0].message.content)

6.2 JavaScript (OpenAI SDK)

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:3160/v1',
  apiKey: 'not-required'
});

const response = await client.chat.completions.create({
  model: 'gpt-oss-20b',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is 2+2?' }
  ],
  max_tokens: 100
});

console.log(response.choices[0].message.content);

6.3 cURL

curl -X POST http://localhost:3160/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss-20b",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "max_tokens": 100
  }'

6.4 Health Check

curl http://localhost:3160/health

7. INTEGRACION CON AGENTES

7.1 Claude Code

# Configuracion en hook de Claude Code
mcp_servers:
  local-llm:
    url: http://localhost:3160
    capabilities:
      - chat
      - classify
      - extract
      - summarize

7.2 Trae IDE

{
  "llm": {
    "provider": "openai-compatible",
    "baseUrl": "http://localhost:3160/v1",
    "model": "gpt-oss-20b"
  }
}

8. CHANGELOG

Version Fecha Cambios
1.0.0 2026-01-20 Version inicial

Documento Controlado

  • Autor: Requirements-Analyst Agent
  • Fecha: 2026-01-20