# API Reference - Local LLM Agent

**Version:** 1.0.0
**Base URL:** `http://localhost:3160`
**Fecha:** 2026-01-20

---

## 1. OVERVIEW

Local LLM Agent expone una API REST compatible con el estandar OpenAI para integracion transparente con clientes existentes.

### 1.1 Base URLs

| Servicio | URL | Descripcion |
|----------|-----|-------------|
| API Gateway | `http://localhost:3160` | Punto de entrada principal |
| Inference Engine | `http://localhost:3161` | Backend (solo red interna) |
| Ollama | `http://localhost:11434` | Runtime (solo host) |

### 1.2 Content-Type

Todas las requests deben usar:
```
Content-Type: application/json
```

### 1.3 Autenticacion

**MVP:** Sin autenticacion requerida (red local confiable)

**Fase 2:** Header `X-API-Key` opcional

---

## 2. ENDPOINTS

### 2.1 Chat Completions

#### POST /v1/chat/completions

Crea una respuesta de chat basada en los mensajes proporcionados.

**Request:**

```http
POST /v1/chat/completions HTTP/1.1
Host: localhost:3160
Content-Type: application/json

{
  "model": "gpt-oss-20b",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  "max_tokens": 512,
  "temperature": 0.7,
  "top_p": 0.9
}
```

**Request Parameters:**

| Parametro | Tipo | Requerido | Default | Descripcion |
|-----------|------|-----------|---------|-------------|
| model | string | Si | - | ID del modelo a usar |
| messages | array | Si | - | Lista de mensajes |
| max_tokens | integer | No | 512 | Maximo tokens a generar |
| temperature | number | No | 0.7 | Temperatura (0.0-2.0) |
| top_p | number | No | 0.9 | Top-p sampling (0.0-1.0) |
| stream | boolean | No | false | Streaming (no soportado MVP) |

**Message Object:**

| Campo | Tipo | Requerido | Descripcion |
|-------|------|-----------|-------------|
| role | string | Si | "system", "user", o "assistant" |
| content | string | Si | Contenido del mensaje |
| name | string | No | Nombre del emisor |

**Response (200 OK):**

```json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1706000000,
  "model": "gpt-oss-20b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 10,
    "total_tokens": 35
  }
}
```

**Response Fields:**

| Campo | Tipo | Descripcion |
|-------|------|-------------|
| id | string | ID unico de la completion |
| object | string | Siempre "chat.completion" |
| created | integer | Unix timestamp |
| model | string | Modelo usado |
| choices | array | Lista de respuestas generadas |
| choices[].index | integer | Indice de la opcion |
| choices[].message | object | Mensaje generado |
| choices[].finish_reason | string | "stop" o "length" |
| usage | object | Estadisticas de tokens |

**Errores:**

| Status | Code | Descripcion |
|--------|------|-------------|
| 400 | invalid_request | Request mal formada |
| 404 | model_not_found | Modelo no disponible |
| 413 | context_length_exceeded | Contexto muy largo |
| 500 | internal_error | Error interno |
| 503 | backend_unavailable | Backend no disponible |
| 504 | inference_timeout | Timeout de inferencia |

---

### 2.2 Models

#### GET /v1/models

Lista los modelos disponibles.

**Request:**

```http
GET /v1/models HTTP/1.1
Host: localhost:3160
```

**Response (200 OK):**

```json
{
  "object": "list",
  "data": [
    {
      "id": "gpt-oss-20b",
      "object": "model",
      "created": 1706000000,
      "owned_by": "ollama"
    },
    {
      "id": "gpt-oss-20b:erp-core",
      "object": "model",
      "created": 1706000000,
      "owned_by": "ollama"
    }
  ]
}
```

**Response Fields:**

| Campo | Tipo | Descripcion |
|-------|------|-------------|
| object | string | Siempre "list" |
| data | array | Lista de modelos |
| data[].id | string | ID del modelo |
| data[].object | string | Siempre "model" |
| data[].created | integer | Unix timestamp |
| data[].owned_by | string | Propietario ("ollama") |

---

### 2.3 Health

#### GET /health

Verifica el estado del servicio.

**Request:**

```http
GET /health HTTP/1.1
Host: localhost:3160
```

**Response (200 OK - Healthy):**

```json
{
  "status": "healthy",
  "timestamp": "2026-01-20T10:30:00.000Z",
  "version": "0.1.0",
  "dependencies": {
    "inference_engine": "up",
    "ollama": "up"
  }
}
```

**Response (503 Service Unavailable - Unhealthy):**

```json
{
  "status": "unhealthy",
  "timestamp": "2026-01-20T10:30:00.000Z",
  "version": "0.1.0",
  "dependencies": {
    "inference_engine": "up",
    "ollama": "down"
  }
}
```

**Status Values:**

| Status | Descripcion |
|--------|-------------|
| healthy | Todos los componentes operativos |
| degraded | Algunos componentes con problemas |
| unhealthy | Servicio no operativo |

---

### 2.4 MCP Tools (Fase 2)

#### GET /mcp/tools

Lista las herramientas MCP disponibles.

**Request:**

```http
GET /mcp/tools HTTP/1.1
Host: localhost:3160
```

**Response (200 OK):**

```json
{
  "tools": [
    {
      "name": "classify",
      "description": "Classify text into predefined categories",
      "version": "1.0.0",
      "parameters": {
        "type": "object",
        "properties": {
          "text": {
            "type": "string",
            "description": "Text to classify"
          },
          "categories": {
            "type": "array",
            "items": {"type": "string"},
            "description": "Possible categories"
          }
        },
        "required": ["text", "categories"]
      }
    },
    {
      "name": "extract",
      "description": "Extract structured data from text",
      "version": "1.0.0",
      "parameters": {...}
    },
    {
      "name": "summarize",
      "description": "Summarize text",
      "version": "1.0.0",
      "parameters": {...}
    },
    {
      "name": "rewrite",
      "description": "Rewrite text with specific style",
      "version": "1.0.0",
      "parameters": {...}
    }
  ]
}
```

---

#### POST /mcp/tools/:name

Ejecuta una herramienta MCP especifica.

**Request:**

```http
POST /mcp/tools/classify HTTP/1.1
Host: localhost:3160
Content-Type: application/json

{
  "text": "The customer reported a bug in the login form",
  "categories": ["bug", "feature", "question", "documentation"]
}
```

**Response (200 OK):**

```json
{
  "category": "bug",
  "confidence": 0.92,
  "reasoning": "The text mentions 'bug' and describes a problem with functionality"
}
```

---

## 3. ERROR RESPONSES

### 3.1 Error Format

Todas las respuestas de error siguen el formato OpenAI:

```json
{
  "error": {
    "code": "error_code",
    "message": "Human readable message",
    "type": "error_type",
    "param": "parameter_name"
  }
}
```

### 3.2 Error Types

| Type | Descripcion |
|------|-------------|
| invalid_request_error | Request mal formada o parametros invalidos |
| authentication_error | Autenticacion fallida (Fase 2) |
| rate_limit_error | Rate limit excedido (Fase 2) |
| server_error | Error interno del servidor |

### 3.3 Error Codes

| Code | HTTP Status | Descripcion |
|------|-------------|-------------|
| invalid_request | 400 | Request invalida |
| model_not_found | 404 | Modelo no existe |
| context_length_exceeded | 413 | Contexto muy largo |
| rate_limited | 429 | Rate limit |
| backend_unavailable | 503 | Backend no disponible |
| inference_timeout | 504 | Timeout |
| internal_error | 500 | Error interno |

---

## 4. RATE LIMITS (Fase 2)

| Tier | Requests/min | Tokens/min |
|------|--------------|------------|
| small | 40 | 20000 |
| main | 10 | 50000 |

---

## 5. HEADERS

### 5.1 Request Headers

| Header | Descripcion | Requerido |
|--------|-------------|-----------|
| Content-Type | application/json | Si |
| X-API-Key | API key (Fase 2) | No |
| X-Tier | Tier forzado (small/main) | No |
| X-Request-ID | ID para tracking | No |

### 5.2 Response Headers

| Header | Descripcion |
|--------|-------------|
| X-Request-ID | ID de la request (generado si no se proporciona) |
| X-Latency-Ms | Latencia de procesamiento |
| X-Tier | Tier usado para la request |

---

## 6. EJEMPLOS DE USO

### 6.1 Python (OpenAI SDK)

```python
import openai

client = openai.OpenAI(
    base_url="http://localhost:3160/v1",
    api_key="not-required"  # MVP no requiere API key
)

response = client.chat.completions.create(
    model="gpt-oss-20b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is 2+2?"}
    ],
    max_tokens=100
)

print(response.choices[0].message.content)
```

### 6.2 JavaScript (OpenAI SDK)

```javascript
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:3160/v1',
  apiKey: 'not-required'
});

const response = await client.chat.completions.create({
  model: 'gpt-oss-20b',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is 2+2?' }
  ],
  max_tokens: 100
});

console.log(response.choices[0].message.content);
```

### 6.3 cURL

```bash
curl -X POST http://localhost:3160/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss-20b",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "max_tokens": 100
  }'
```

### 6.4 Health Check

```bash
curl http://localhost:3160/health
```

---

## 7. INTEGRACION CON AGENTES

### 7.1 Claude Code

```yaml
# Configuracion en hook de Claude Code
mcp_servers:
  local-llm:
    url: http://localhost:3160
    capabilities:
      - chat
      - classify
      - extract
      - summarize
```

### 7.2 Trae IDE

```json
{
  "llm": {
    "provider": "openai-compatible",
    "baseUrl": "http://localhost:3160/v1",
    "model": "gpt-oss-20b"
  }
}
```

---

## 8. CHANGELOG

| Version | Fecha | Cambios |
|---------|-------|---------|
| 1.0.0 | 2026-01-20 | Version inicial |

---

**Documento Controlado**
- Autor: Requirements-Analyst Agent
- Fecha: 2026-01-20