# CONTEXTO-PROYECTO.md - Local LLM Agent

**Sistema:** SIMCO v4.3.0 + NEXUS v4.0
**Proyecto:** Local LLM Agent
**Nivel:** CONSUMER (L2) - STANDALONE Infrastructure
**Version:** 0.6.0
**Fecha:** 2026-01-24

---

## RESUMEN EJECUTIVO

Local LLM Agent es un **gateway de inferencia LLM** que permite a los agentes del workspace
(Claude Code, Trae, Gemini) delegar tareas simples para ahorrar contexto y tokens.

**Rol en el workspace:** Infraestructura de soporte
**Prioridad:** P1 (segun ROADMAP)
**Estado:** Production-ready

---

## PROPOSITO Y ALCANCE

### Objetivo Principal

Proporcionar una API OpenAI-compatible local que permita:
- Delegacion de tareas simples (clasificacion, extraccion, reescritura, resumen)
- Ahorro de tokens en agentes principales
- Inferencia local con modelos open-source

### Alcance

| Incluye | Excluye |
|---------|---------|
| API Gateway NestJS | Entrenamiento de modelos |
| Inference Engine Python | Modelos propietarios |
| MCP Tools (4 herramientas) | Integracion con LLMs externos |
| Multi-backend (Ollama, vLLM) | Produccion en cloud |
| Monitoring (Prometheus/Grafana) | Alta disponibilidad |

---

## STACK TECNOLOGICO

### Gateway API
- **Framework:** NestJS 10.x
- **Lenguaje:** TypeScript
- **Runtime:** Node.js 20 LTS
- **Puerto:** 3160

### Inference Engine
- **Framework:** FastAPI
- **Lenguaje:** Python 3.11
- **Puerto:** 3161

### Backends de Inferencia

| Backend | Tipo | Puerto | Uso |
|---------|------|--------|-----|
| Ollama | CPU | 11434 | Desarrollo |
| vLLM | GPU | 8000 | Produccion |

### Monitoring

| Servicio | Puerto | Uso |
|----------|--------|-----|
| Prometheus | 9090 | Metricas |
| Grafana | 3000 | Dashboard |

---

## SERVICIOS EXPUESTOS

### LLM-SVC-001: Gateway API

**Puerto:** 3160
**Path:** apps/gateway
**Estado:** production-ready

**Endpoints:**
- `POST /v1/chat/completions` - Chat completion OpenAI-compatible
- `GET /v1/models` - Lista de modelos disponibles
- `POST /v1/lora/*` - Gestion de LoRA adapters
- `POST /mcp/tools/*` - MCP Tools (classify, extract, rewrite, summarize)
- `GET /health` - Health check

### LLM-SVC-002: Inference Engine

**Puerto:** 3161
**Path:** apps/inference-engine
**Estado:** production-ready

**Endpoints:**
- `POST /chat` - Inferencia interna
- `GET /models` - Modelos cargados
- `GET /health` - Health check
- `GET /metrics` - Metricas Prometheus

---

## MCP TOOLS DISPONIBLES

| Tool | Descripcion | Tier |
|------|-------------|------|
| `classify` | Clasificar texto en categorias | small |
| `extract` | Extraer datos estructurados | small |
| `rewrite` | Reescribir texto | main |
| `summarize` | Resumir texto | main |

### Tiers de Inferencia

| Tier | Max Tokens | Max Context | Latencia Target |
|------|------------|-------------|-----------------|
| small | 512 | 4096 | 500ms |
| main | 2048 | 16384 | 2000ms |

---

## FASES DE DESARROLLO

### Fase 1: MVP (COMPLETADA)
- Gateway NestJS basico
- Inference Engine Python
- Integracion con Ollama
- Docker setup inicial

### Fase 2: MCP Tools (COMPLETADA)
- 4 MCP Tools
- Rate limiting por tier
- 98 tests pasando

### Fase 3: Produccion (COMPLETADA)
- Backend vLLM con GPU
- Multi-LoRA adapters
- Prometheus metrics
- Grafana dashboard
- Production docker-compose

---

## DEPENDENCIAS

### Runtime (al menos uno requerido)
- **Ollama:** Backend CPU para desarrollo
- **vLLM:** Backend GPU para produccion

### Opcionales
- PostgreSQL 16 (metricas, DB: local_llm_dev)
- Redis (cache, DB: 9)
- Prometheus (monitoring)
- Grafana (dashboard)

### GPU (solo para vLLM)
- NVIDIA CUDA >= 12.6
- NVIDIA Container Toolkit

---

## HERENCIA Y RELACIONES

```
workspace-v2/orchestration/
         |
         v
  local-llm-agent (STANDALONE)
         |
         v
  [Sirve a todos los proyectos via API]
```

**Tipo:** CONSUMER (L2) - STANDALONE
**Hereda de:** workspace-v2/orchestration/ (solo directivas)
**Exporta a:** Ninguno (es servicio, no biblioteca)
**Consumidores:** Todos los proyectos via API

---

## RUTAS IMPORTANTES

| Ruta | Descripcion |
|------|-------------|
| `apps/gateway/` | Gateway NestJS |
| `apps/inference-engine/` | Inference Engine Python |
| `config/` | Configuracion compartida |
| `docs/` | Documentacion del proyecto |
| `orchestration/` | Gobernanza SIMCO |

---

## CONTACTO Y EQUIPO

- **Owner:** ISEM Development
- **Agentes principales:** Claude Code, Trae

---

## NOTAS

1. Proyecto STANDALONE de infraestructura
2. Sirve a todos los proyectos del workspace via API
3. No forma parte de la jerarquia ERP
4. Phase 3 complete - Production ready
5. GPU setup requiere WSL con NVIDIA drivers

---

*CONTEXTO-PROYECTO.md Local LLM Agent v1.0.0 - Sistema SIMCO v4.3.0*