# API Reference - Local LLM Agent **Version:** 1.0.0 **Base URL:** `http://localhost:3160` **Fecha:** 2026-01-20 --- ## 1. OVERVIEW Local LLM Agent expone una API REST compatible con el estandar OpenAI para integracion transparente con clientes existentes. ### 1.1 Base URLs | Servicio | URL | Descripcion | |----------|-----|-------------| | API Gateway | `http://localhost:3160` | Punto de entrada principal | | Inference Engine | `http://localhost:3161` | Backend (solo red interna) | | Ollama | `http://localhost:11434` | Runtime (solo host) | ### 1.2 Content-Type Todas las requests deben usar: ``` Content-Type: application/json ``` ### 1.3 Autenticacion **MVP:** Sin autenticacion requerida (red local confiable) **Fase 2:** Header `X-API-Key` opcional --- ## 2. ENDPOINTS ### 2.1 Chat Completions #### POST /v1/chat/completions Crea una respuesta de chat basada en los mensajes proporcionados. **Request:** ```http POST /v1/chat/completions HTTP/1.1 Host: localhost:3160 Content-Type: application/json { "model": "gpt-oss-20b", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"} ], "max_tokens": 512, "temperature": 0.7, "top_p": 0.9 } ``` **Request Parameters:** | Parametro | Tipo | Requerido | Default | Descripcion | |-----------|------|-----------|---------|-------------| | model | string | Si | - | ID del modelo a usar | | messages | array | Si | - | Lista de mensajes | | max_tokens | integer | No | 512 | Maximo tokens a generar | | temperature | number | No | 0.7 | Temperatura (0.0-2.0) | | top_p | number | No | 0.9 | Top-p sampling (0.0-1.0) | | stream | boolean | No | false | Streaming (no soportado MVP) | **Message Object:** | Campo | Tipo | Requerido | Descripcion | |-------|------|-----------|-------------| | role | string | Si | "system", "user", o "assistant" | | content | string | Si | Contenido del mensaje | | name | string | No | Nombre del emisor | **Response (200 OK):** ```json { "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1706000000, "model": "gpt-oss-20b", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! How can I help you today?" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 25, "completion_tokens": 10, "total_tokens": 35 } } ``` **Response Fields:** | Campo | Tipo | Descripcion | |-------|------|-------------| | id | string | ID unico de la completion | | object | string | Siempre "chat.completion" | | created | integer | Unix timestamp | | model | string | Modelo usado | | choices | array | Lista de respuestas generadas | | choices[].index | integer | Indice de la opcion | | choices[].message | object | Mensaje generado | | choices[].finish_reason | string | "stop" o "length" | | usage | object | Estadisticas de tokens | **Errores:** | Status | Code | Descripcion | |--------|------|-------------| | 400 | invalid_request | Request mal formada | | 404 | model_not_found | Modelo no disponible | | 413 | context_length_exceeded | Contexto muy largo | | 500 | internal_error | Error interno | | 503 | backend_unavailable | Backend no disponible | | 504 | inference_timeout | Timeout de inferencia | --- ### 2.2 Models #### GET /v1/models Lista los modelos disponibles. **Request:** ```http GET /v1/models HTTP/1.1 Host: localhost:3160 ``` **Response (200 OK):** ```json { "object": "list", "data": [ { "id": "gpt-oss-20b", "object": "model", "created": 1706000000, "owned_by": "ollama" }, { "id": "gpt-oss-20b:erp-core", "object": "model", "created": 1706000000, "owned_by": "ollama" } ] } ``` **Response Fields:** | Campo | Tipo | Descripcion | |-------|------|-------------| | object | string | Siempre "list" | | data | array | Lista de modelos | | data[].id | string | ID del modelo | | data[].object | string | Siempre "model" | | data[].created | integer | Unix timestamp | | data[].owned_by | string | Propietario ("ollama") | --- ### 2.3 Health #### GET /health Verifica el estado del servicio. **Request:** ```http GET /health HTTP/1.1 Host: localhost:3160 ``` **Response (200 OK - Healthy):** ```json { "status": "healthy", "timestamp": "2026-01-20T10:30:00.000Z", "version": "0.1.0", "dependencies": { "inference_engine": "up", "ollama": "up" } } ``` **Response (503 Service Unavailable - Unhealthy):** ```json { "status": "unhealthy", "timestamp": "2026-01-20T10:30:00.000Z", "version": "0.1.0", "dependencies": { "inference_engine": "up", "ollama": "down" } } ``` **Status Values:** | Status | Descripcion | |--------|-------------| | healthy | Todos los componentes operativos | | degraded | Algunos componentes con problemas | | unhealthy | Servicio no operativo | --- ### 2.4 MCP Tools (Fase 2) #### GET /mcp/tools Lista las herramientas MCP disponibles. **Request:** ```http GET /mcp/tools HTTP/1.1 Host: localhost:3160 ``` **Response (200 OK):** ```json { "tools": [ { "name": "classify", "description": "Classify text into predefined categories", "version": "1.0.0", "parameters": { "type": "object", "properties": { "text": { "type": "string", "description": "Text to classify" }, "categories": { "type": "array", "items": {"type": "string"}, "description": "Possible categories" } }, "required": ["text", "categories"] } }, { "name": "extract", "description": "Extract structured data from text", "version": "1.0.0", "parameters": {...} }, { "name": "summarize", "description": "Summarize text", "version": "1.0.0", "parameters": {...} }, { "name": "rewrite", "description": "Rewrite text with specific style", "version": "1.0.0", "parameters": {...} } ] } ``` --- #### POST /mcp/tools/:name Ejecuta una herramienta MCP especifica. **Request:** ```http POST /mcp/tools/classify HTTP/1.1 Host: localhost:3160 Content-Type: application/json { "text": "The customer reported a bug in the login form", "categories": ["bug", "feature", "question", "documentation"] } ``` **Response (200 OK):** ```json { "category": "bug", "confidence": 0.92, "reasoning": "The text mentions 'bug' and describes a problem with functionality" } ``` --- ## 3. ERROR RESPONSES ### 3.1 Error Format Todas las respuestas de error siguen el formato OpenAI: ```json { "error": { "code": "error_code", "message": "Human readable message", "type": "error_type", "param": "parameter_name" } } ``` ### 3.2 Error Types | Type | Descripcion | |------|-------------| | invalid_request_error | Request mal formada o parametros invalidos | | authentication_error | Autenticacion fallida (Fase 2) | | rate_limit_error | Rate limit excedido (Fase 2) | | server_error | Error interno del servidor | ### 3.3 Error Codes | Code | HTTP Status | Descripcion | |------|-------------|-------------| | invalid_request | 400 | Request invalida | | model_not_found | 404 | Modelo no existe | | context_length_exceeded | 413 | Contexto muy largo | | rate_limited | 429 | Rate limit | | backend_unavailable | 503 | Backend no disponible | | inference_timeout | 504 | Timeout | | internal_error | 500 | Error interno | --- ## 4. RATE LIMITS (Fase 2) | Tier | Requests/min | Tokens/min | |------|--------------|------------| | small | 40 | 20000 | | main | 10 | 50000 | --- ## 5. HEADERS ### 5.1 Request Headers | Header | Descripcion | Requerido | |--------|-------------|-----------| | Content-Type | application/json | Si | | X-API-Key | API key (Fase 2) | No | | X-Tier | Tier forzado (small/main) | No | | X-Request-ID | ID para tracking | No | ### 5.2 Response Headers | Header | Descripcion | |--------|-------------| | X-Request-ID | ID de la request (generado si no se proporciona) | | X-Latency-Ms | Latencia de procesamiento | | X-Tier | Tier usado para la request | --- ## 6. EJEMPLOS DE USO ### 6.1 Python (OpenAI SDK) ```python import openai client = openai.OpenAI( base_url="http://localhost:3160/v1", api_key="not-required" # MVP no requiere API key ) response = client.chat.completions.create( model="gpt-oss-20b", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is 2+2?"} ], max_tokens=100 ) print(response.choices[0].message.content) ``` ### 6.2 JavaScript (OpenAI SDK) ```javascript import OpenAI from 'openai'; const client = new OpenAI({ baseURL: 'http://localhost:3160/v1', apiKey: 'not-required' }); const response = await client.chat.completions.create({ model: 'gpt-oss-20b', messages: [ { role: 'system', content: 'You are a helpful assistant.' }, { role: 'user', content: 'What is 2+2?' } ], max_tokens: 100 }); console.log(response.choices[0].message.content); ``` ### 6.3 cURL ```bash curl -X POST http://localhost:3160/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-oss-20b", "messages": [ {"role": "user", "content": "Hello!"} ], "max_tokens": 100 }' ``` ### 6.4 Health Check ```bash curl http://localhost:3160/health ``` --- ## 7. INTEGRACION CON AGENTES ### 7.1 Claude Code ```yaml # Configuracion en hook de Claude Code mcp_servers: local-llm: url: http://localhost:3160 capabilities: - chat - classify - extract - summarize ``` ### 7.2 Trae IDE ```json { "llm": { "provider": "openai-compatible", "baseUrl": "http://localhost:3160/v1", "model": "gpt-oss-20b" } } ``` --- ## 8. CHANGELOG | Version | Fecha | Cambios | |---------|-------|---------| | 1.0.0 | 2026-01-20 | Version inicial | --- **Documento Controlado** - Autor: Requirements-Analyst Agent - Fecha: 2026-01-20