local-llm-agent/docs/80-referencias/API-REFERENCE.md
Adrian Flores Cortes 3def230d58 Initial commit: local-llm-agent infrastructure project
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 16:42:45 -06:00

504 lines
9.9 KiB
Markdown

# API Reference - Local LLM Agent
**Version:** 1.0.0
**Base URL:** `http://localhost:3160`
**Fecha:** 2026-01-20
---
## 1. OVERVIEW
Local LLM Agent expone una API REST compatible con el estandar OpenAI para integracion transparente con clientes existentes.
### 1.1 Base URLs
| Servicio | URL | Descripcion |
|----------|-----|-------------|
| API Gateway | `http://localhost:3160` | Punto de entrada principal |
| Inference Engine | `http://localhost:3161` | Backend (solo red interna) |
| Ollama | `http://localhost:11434` | Runtime (solo host) |
### 1.2 Content-Type
Todas las requests deben usar:
```
Content-Type: application/json
```
### 1.3 Autenticacion
**MVP:** Sin autenticacion requerida (red local confiable)
**Fase 2:** Header `X-API-Key` opcional
---
## 2. ENDPOINTS
### 2.1 Chat Completions
#### POST /v1/chat/completions
Crea una respuesta de chat basada en los mensajes proporcionados.
**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Host: localhost:3160
Content-Type: application/json
```
**Request Parameters:**
| Parametro | Tipo | Requerido | Default | Descripcion |
|-----------|------|-----------|---------|-------------|
| model | string | Si | - | ID del modelo a usar |
| messages | array | Si | - | Lista de mensajes |
| max_tokens | integer | No | 512 | Maximo tokens a generar |
| temperature | number | No | 0.7 | Temperatura (0.0-2.0) |
| top_p | number | No | 0.9 | Top-p sampling (0.0-1.0) |
| stream | boolean | No | false | Streaming (no soportado MVP) |
**Message Object:**
| Campo | Tipo | Requerido | Descripcion |
|-------|------|-----------|-------------|
| role | string | Si | "system", "user", o "assistant" |
| content | string | Si | Contenido del mensaje |
| name | string | No | Nombre del emisor |
**Response (200 OK):**
```json
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1706000000,
"model": "gpt-oss-20b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 10,
"total_tokens": 35
}
}
```
**Response Fields:**
| Campo | Tipo | Descripcion |
|-------|------|-------------|
| id | string | ID unico de la completion |
| object | string | Siempre "chat.completion" |
| created | integer | Unix timestamp |
| model | string | Modelo usado |
| choices | array | Lista de respuestas generadas |
| choices[].index | integer | Indice de la opcion |
| choices[].message | object | Mensaje generado |
| choices[].finish_reason | string | "stop" o "length" |
| usage | object | Estadisticas de tokens |
**Errores:**
| Status | Code | Descripcion |
|--------|------|-------------|
| 400 | invalid_request | Request mal formada |
| 404 | model_not_found | Modelo no disponible |
| 413 | context_length_exceeded | Contexto muy largo |
| 500 | internal_error | Error interno |
| 503 | backend_unavailable | Backend no disponible |
| 504 | inference_timeout | Timeout de inferencia |
---
### 2.2 Models
#### GET /v1/models
Lista los modelos disponibles.
**Request:**
```http
GET /v1/models HTTP/1.1
Host: localhost:3160
```
**Response (200 OK):**
```json
{
"object": "list",
"data": [
{
"id": "gpt-oss-20b",
"object": "model",
"created": 1706000000,
"owned_by": "ollama"
},
{
"id": "gpt-oss-20b:erp-core",
"object": "model",
"created": 1706000000,
"owned_by": "ollama"
}
]
}
```
**Response Fields:**
| Campo | Tipo | Descripcion |
|-------|------|-------------|
| object | string | Siempre "list" |
| data | array | Lista de modelos |
| data[].id | string | ID del modelo |
| data[].object | string | Siempre "model" |
| data[].created | integer | Unix timestamp |
| data[].owned_by | string | Propietario ("ollama") |
---
### 2.3 Health
#### GET /health
Verifica el estado del servicio.
**Request:**
```http
GET /health HTTP/1.1
Host: localhost:3160
```
**Response (200 OK - Healthy):**
```json
{
"status": "healthy",
"timestamp": "2026-01-20T10:30:00.000Z",
"version": "0.1.0",
"dependencies": {
"inference_engine": "up",
"ollama": "up"
}
}
```
**Response (503 Service Unavailable - Unhealthy):**
```json
{
"status": "unhealthy",
"timestamp": "2026-01-20T10:30:00.000Z",
"version": "0.1.0",
"dependencies": {
"inference_engine": "up",
"ollama": "down"
}
}
```
**Status Values:**
| Status | Descripcion |
|--------|-------------|
| healthy | Todos los componentes operativos |
| degraded | Algunos componentes con problemas |
| unhealthy | Servicio no operativo |
---
### 2.4 MCP Tools (Fase 2)
#### GET /mcp/tools
Lista las herramientas MCP disponibles.
**Request:**
```http
GET /mcp/tools HTTP/1.1
Host: localhost:3160
```
**Response (200 OK):**
```json
{
"tools": [
{
"name": "classify",
"description": "Classify text into predefined categories",
"version": "1.0.0",
"parameters": {
"type": "object",
"properties": {
"text": {
"type": "string",
"description": "Text to classify"
},
"categories": {
"type": "array",
"items": {"type": "string"},
"description": "Possible categories"
}
},
"required": ["text", "categories"]
}
},
{
"name": "extract",
"description": "Extract structured data from text",
"version": "1.0.0",
"parameters": {...}
},
{
"name": "summarize",
"description": "Summarize text",
"version": "1.0.0",
"parameters": {...}
},
{
"name": "rewrite",
"description": "Rewrite text with specific style",
"version": "1.0.0",
"parameters": {...}
}
]
}
```
---
#### POST /mcp/tools/:name
Ejecuta una herramienta MCP especifica.
**Request:**
```http
POST /mcp/tools/classify HTTP/1.1
Host: localhost:3160
Content-Type: application/json
```
**Response (200 OK):**
```json
{
"category": "bug",
"confidence": 0.92,
"reasoning": "The text mentions 'bug' and describes a problem with functionality"
}
```
---
## 3. ERROR RESPONSES
### 3.1 Error Format
Todas las respuestas de error siguen el formato OpenAI:
```json
{
"error": {
"code": "error_code",
"message": "Human readable message",
"type": "error_type",
"param": "parameter_name"
}
}
```
### 3.2 Error Types
| Type | Descripcion |
|------|-------------|
| invalid_request_error | Request mal formada o parametros invalidos |
| authentication_error | Autenticacion fallida (Fase 2) |
| rate_limit_error | Rate limit excedido (Fase 2) |
| server_error | Error interno del servidor |
### 3.3 Error Codes
| Code | HTTP Status | Descripcion |
|------|-------------|-------------|
| invalid_request | 400 | Request invalida |
| model_not_found | 404 | Modelo no existe |
| context_length_exceeded | 413 | Contexto muy largo |
| rate_limited | 429 | Rate limit |
| backend_unavailable | 503 | Backend no disponible |
| inference_timeout | 504 | Timeout |
| internal_error | 500 | Error interno |
---
## 4. RATE LIMITS (Fase 2)
| Tier | Requests/min | Tokens/min |
|------|--------------|------------|
| small | 40 | 20000 |
| main | 10 | 50000 |
---
## 5. HEADERS
### 5.1 Request Headers
| Header | Descripcion | Requerido |
|--------|-------------|-----------|
| Content-Type | application/json | Si |
| X-API-Key | API key (Fase 2) | No |
| X-Tier | Tier forzado (small/main) | No |
| X-Request-ID | ID para tracking | No |
### 5.2 Response Headers
| Header | Descripcion |
|--------|-------------|
| X-Request-ID | ID de la request (generado si no se proporciona) |
| X-Latency-Ms | Latencia de procesamiento |
| X-Tier | Tier usado para la request |
---
## 6. EJEMPLOS DE USO
### 6.1 Python (OpenAI SDK)
```python
import openai
client = openai.OpenAI(
base_url="http://localhost:3160/v1",
api_key="not-required" # MVP no requiere API key
)
response = client.chat.completions.create(
model="gpt-oss-20b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is 2+2?"}
],
max_tokens=100
)
print(response.choices[0].message.content)
```
### 6.2 JavaScript (OpenAI SDK)
```javascript
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:3160/v1',
apiKey: 'not-required'
});
const response = await client.chat.completions.create({
model: 'gpt-oss-20b',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'What is 2+2?' }
],
max_tokens: 100
});
console.log(response.choices[0].message.content);
```
### 6.3 cURL
```bash
curl -X POST http://localhost:3160/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-oss-20b",
"messages": [
{"role": "user", "content": "Hello!"}
],
"max_tokens": 100
}'
```
### 6.4 Health Check
```bash
curl http://localhost:3160/health
```
---
## 7. INTEGRACION CON AGENTES
### 7.1 Claude Code
```yaml
# Configuracion en hook de Claude Code
mcp_servers:
local-llm:
url: http://localhost:3160
capabilities:
- chat
- classify
- extract
- summarize
```
### 7.2 Trae IDE
```json
{
"llm": {
"provider": "openai-compatible",
"baseUrl": "http://localhost:3160/v1",
"model": "gpt-oss-20b"
}
}
```
---
## 8. CHANGELOG
| Version | Fecha | Cambios |
|---------|-------|---------|
| 1.0.0 | 2026-01-20 | Version inicial |
---
**Documento Controlado**
- Autor: Requirements-Analyst Agent
- Fecha: 2026-01-20