Adrian Flores Cortes 3def230d58 Initial commit: local-llm-agent infrastructure project

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-02 16:42:45 -06:00

9.9 KiB

Raw Blame History

API Reference - Local LLM Agent

Version: 1.0.0 Base URL: http://localhost:3160 Fecha: 2026-01-20

1. OVERVIEW

Local LLM Agent expone una API REST compatible con el estandar OpenAI para integracion transparente con clientes existentes.

1.1 Base URLs

Servicio	URL	Descripcion
API Gateway	`http://localhost:3160`	Punto de entrada principal
Inference Engine	`http://localhost:3161`	Backend (solo red interna)
Ollama	`http://localhost:11434`	Runtime (solo host)

1.2 Content-Type

Todas las requests deben usar:

Content-Type: application/json

1.3 Autenticacion

MVP: Sin autenticacion requerida (red local confiable)

Fase 2: Header X-API-Key opcional

2. ENDPOINTS

2.1 Chat Completions

POST /v1/chat/completions

Crea una respuesta de chat basada en los mensajes proporcionados.

Request:

POST /v1/chat/completions HTTP/1.1
Host: localhost:3160
Content-Type: application/json

{
  "model": "gpt-oss-20b",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  "max_tokens": 512,
  "temperature": 0.7,
  "top_p": 0.9
}

Request Parameters:

Parametro	Tipo	Requerido	Default	Descripcion
model	string	Si	-	ID del modelo a usar
messages	array	Si	-	Lista de mensajes
max_tokens	integer	No	512	Maximo tokens a generar
temperature	number	No	0.7	Temperatura (0.0-2.0)
top_p	number	No	0.9	Top-p sampling (0.0-1.0)
stream	boolean	No	false	Streaming (no soportado MVP)

Message Object:

Campo	Tipo	Requerido	Descripcion
role	string	Si	"system", "user", o "assistant"
content	string	Si	Contenido del mensaje
name	string	No	Nombre del emisor

Response (200 OK):

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1706000000,
  "model": "gpt-oss-20b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 10,
    "total_tokens": 35
  }
}

Response Fields:

Campo	Tipo	Descripcion
id	string	ID unico de la completion
object	string	Siempre "chat.completion"
created	integer	Unix timestamp
model	string	Modelo usado
choices	array	Lista de respuestas generadas
choices[].index	integer	Indice de la opcion
choices[].message	object	Mensaje generado
choices[].finish_reason	string	"stop" o "length"
usage	object	Estadisticas de tokens

Errores:

Status	Code	Descripcion
400	invalid_request	Request mal formada
404	model_not_found	Modelo no disponible
413	context_length_exceeded	Contexto muy largo
500	internal_error	Error interno
503	backend_unavailable	Backend no disponible
504	inference_timeout	Timeout de inferencia

2.2 Models

GET /v1/models

Lista los modelos disponibles.

Request:

GET /v1/models HTTP/1.1
Host: localhost:3160

Response (200 OK):

{
  "object": "list",
  "data": [
    {
      "id": "gpt-oss-20b",
      "object": "model",
      "created": 1706000000,
      "owned_by": "ollama"
    },
    {
      "id": "gpt-oss-20b:erp-core",
      "object": "model",
      "created": 1706000000,
      "owned_by": "ollama"
    }
  ]
}

Response Fields:

Campo	Tipo	Descripcion
object	string	Siempre "list"
data	array	Lista de modelos
data[].id	string	ID del modelo
data[].object	string	Siempre "model"
data[].created	integer	Unix timestamp
data[].owned_by	string	Propietario ("ollama")

2.3 Health

GET /health

Verifica el estado del servicio.

Request:

GET /health HTTP/1.1
Host: localhost:3160

Response (200 OK - Healthy):

{
  "status": "healthy",
  "timestamp": "2026-01-20T10:30:00.000Z",
  "version": "0.1.0",
  "dependencies": {
    "inference_engine": "up",
    "ollama": "up"
  }
}

Response (503 Service Unavailable - Unhealthy):

{
  "status": "unhealthy",
  "timestamp": "2026-01-20T10:30:00.000Z",
  "version": "0.1.0",
  "dependencies": {
    "inference_engine": "up",
    "ollama": "down"
  }
}

Status Values:

Status	Descripcion
healthy	Todos los componentes operativos
degraded	Algunos componentes con problemas
unhealthy	Servicio no operativo

2.4 MCP Tools (Fase 2)

GET /mcp/tools

Lista las herramientas MCP disponibles.

Request:

GET /mcp/tools HTTP/1.1
Host: localhost:3160

Response (200 OK):

{
  "tools": [
    {
      "name": "classify",
      "description": "Classify text into predefined categories",
      "version": "1.0.0",
      "parameters": {
        "type": "object",
        "properties": {
          "text": {
            "type": "string",
            "description": "Text to classify"
          },
          "categories": {
            "type": "array",
            "items": {"type": "string"},
            "description": "Possible categories"
          }
        },
        "required": ["text", "categories"]
      }
    },
    {
      "name": "extract",
      "description": "Extract structured data from text",
      "version": "1.0.0",
      "parameters": {...}
    },
    {
      "name": "summarize",
      "description": "Summarize text",
      "version": "1.0.0",
      "parameters": {...}
    },
    {
      "name": "rewrite",
      "description": "Rewrite text with specific style",
      "version": "1.0.0",
      "parameters": {...}
    }
  ]
}

POST /mcp/tools/:name

Ejecuta una herramienta MCP especifica.

Request:

POST /mcp/tools/classify HTTP/1.1
Host: localhost:3160
Content-Type: application/json

{
  "text": "The customer reported a bug in the login form",
  "categories": ["bug", "feature", "question", "documentation"]
}

Response (200 OK):

{
  "category": "bug",
  "confidence": 0.92,
  "reasoning": "The text mentions 'bug' and describes a problem with functionality"
}

3. ERROR RESPONSES

3.1 Error Format

Todas las respuestas de error siguen el formato OpenAI:

{
  "error": {
    "code": "error_code",
    "message": "Human readable message",
    "type": "error_type",
    "param": "parameter_name"
  }
}

3.2 Error Types

Type	Descripcion
invalid_request_error	Request mal formada o parametros invalidos
authentication_error	Autenticacion fallida (Fase 2)
rate_limit_error	Rate limit excedido (Fase 2)
server_error	Error interno del servidor

3.3 Error Codes

Code	HTTP Status	Descripcion
invalid_request	400	Request invalida
model_not_found	404	Modelo no existe
context_length_exceeded	413	Contexto muy largo
rate_limited	429	Rate limit
backend_unavailable	503	Backend no disponible
inference_timeout	504	Timeout
internal_error	500	Error interno

4. RATE LIMITS (Fase 2)

Tier	Requests/min	Tokens/min
small	40	20000
main	10	50000

5. HEADERS

5.1 Request Headers

Header	Descripcion	Requerido
Content-Type	application/json	Si
X-API-Key	API key (Fase 2)	No
X-Tier	Tier forzado (small/main)	No
X-Request-ID	ID para tracking	No

5.2 Response Headers

Header	Descripcion
X-Request-ID	ID de la request (generado si no se proporciona)
X-Latency-Ms	Latencia de procesamiento
X-Tier	Tier usado para la request

6. EJEMPLOS DE USO

6.1 Python (OpenAI SDK)

import openai

client = openai.OpenAI(
    base_url="http://localhost:3160/v1",
    api_key="not-required"  # MVP no requiere API key
)

response = client.chat.completions.create(
    model="gpt-oss-20b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is 2+2?"}
    ],
    max_tokens=100
)

print(response.choices[0].message.content)

6.2 JavaScript (OpenAI SDK)

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:3160/v1',
  apiKey: 'not-required'
});

const response = await client.chat.completions.create({
  model: 'gpt-oss-20b',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is 2+2?' }
  ],
  max_tokens: 100
});

console.log(response.choices[0].message.content);

6.3 cURL

curl -X POST http://localhost:3160/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss-20b",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "max_tokens": 100
  }'

6.4 Health Check

curl http://localhost:3160/health

7. INTEGRACION CON AGENTES

7.1 Claude Code

# Configuracion en hook de Claude Code
mcp_servers:
  local-llm:
    url: http://localhost:3160
    capabilities:
      - chat
      - classify
      - extract
      - summarize

7.2 Trae IDE

{
  "llm": {
    "provider": "openai-compatible",
    "baseUrl": "http://localhost:3160/v1",
    "model": "gpt-oss-20b"
  }
}

8. CHANGELOG

Version	Fecha	Cambios
1.0.0	2026-01-20	Version inicial

Documento Controlado

Autor: Requirements-Analyst Agent
Fecha: 2026-01-20

9.9 KiB Raw Blame History

API Reference - Local LLM Agent

1. OVERVIEW

1.1 Base URLs

1.2 Content-Type

1.3 Autenticacion

2. ENDPOINTS

2.1 Chat Completions

POST /v1/chat/completions

2.2 Models

GET /v1/models

2.3 Health

GET /health

2.4 MCP Tools (Fase 2)

GET /mcp/tools

POST /mcp/tools/:name

3. ERROR RESPONSES

3.1 Error Format

3.2 Error Types

3.3 Error Codes

4. RATE LIMITS (Fase 2)

5. HEADERS

5.1 Request Headers

5.2 Response Headers

6. EJEMPLOS DE USO

6.1 Python (OpenAI SDK)

6.2 JavaScript (OpenAI SDK)

6.3 cURL

6.4 Health Check

7. INTEGRACION CON AGENTES

7.1 Claude Code

7.2 Trae IDE

8. CHANGELOG

9.9 KiB

Raw Blame History