Adrian Flores Cortes 3def230d58 Initial commit: local-llm-agent infrastructure project

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-02 16:42:45 -06:00

4.6 KiB

Raw Blame History

CONTEXTO-PROYECTO.md - Local LLM Agent

Sistema: SIMCO v4.3.0 + NEXUS v4.0 Proyecto: Local LLM Agent Nivel: CONSUMER (L2) - STANDALONE Infrastructure Version: 0.6.0 Fecha: 2026-01-24

RESUMEN EJECUTIVO

Local LLM Agent es un gateway de inferencia LLM que permite a los agentes del workspace (Claude Code, Trae, Gemini) delegar tareas simples para ahorrar contexto y tokens.

Rol en el workspace: Infraestructura de soporte Prioridad: P1 (segun ROADMAP) Estado: Production-ready

PROPOSITO Y ALCANCE

Objetivo Principal

Proporcionar una API OpenAI-compatible local que permita:

Delegacion de tareas simples (clasificacion, extraccion, reescritura, resumen)
Ahorro de tokens en agentes principales
Inferencia local con modelos open-source

Alcance

Incluye	Excluye
API Gateway NestJS	Entrenamiento de modelos
Inference Engine Python	Modelos propietarios
MCP Tools (4 herramientas)	Integracion con LLMs externos
Multi-backend (Ollama, vLLM)	Produccion en cloud
Monitoring (Prometheus/Grafana)	Alta disponibilidad

STACK TECNOLOGICO

Gateway API

Framework: NestJS 10.x
Lenguaje: TypeScript
Runtime: Node.js 20 LTS
Puerto: 3160

Inference Engine

Framework: FastAPI
Lenguaje: Python 3.11
Puerto: 3161

Backends de Inferencia

Backend	Tipo	Puerto	Uso
Ollama	CPU	11434	Desarrollo
vLLM	GPU	8000	Produccion

Monitoring

Servicio	Puerto	Uso
Prometheus	9090	Metricas
Grafana	3000	Dashboard

SERVICIOS EXPUESTOS

LLM-SVC-001: Gateway API

Puerto: 3160 Path: apps/gateway Estado: production-ready

Endpoints:

POST /v1/chat/completions - Chat completion OpenAI-compatible
GET /v1/models - Lista de modelos disponibles
POST /v1/lora/* - Gestion de LoRA adapters
POST /mcp/tools/* - MCP Tools (classify, extract, rewrite, summarize)
GET /health - Health check

LLM-SVC-002: Inference Engine

Puerto: 3161 Path: apps/inference-engine Estado: production-ready

Endpoints:

POST /chat - Inferencia interna
GET /models - Modelos cargados
GET /health - Health check
GET /metrics - Metricas Prometheus

MCP TOOLS DISPONIBLES

Tool	Descripcion	Tier
`classify`	Clasificar texto en categorias	small
`extract`	Extraer datos estructurados	small
`rewrite`	Reescribir texto	main
`summarize`	Resumir texto	main

Tiers de Inferencia

Tier	Max Tokens	Max Context	Latencia Target
small	512	4096	500ms
main	2048	16384	2000ms

FASES DE DESARROLLO

Fase 1: MVP (COMPLETADA)

Gateway NestJS basico
Inference Engine Python
Integracion con Ollama
Docker setup inicial

Fase 2: MCP Tools (COMPLETADA)

4 MCP Tools
Rate limiting por tier
98 tests pasando

Fase 3: Produccion (COMPLETADA)

Backend vLLM con GPU
Multi-LoRA adapters
Prometheus metrics
Grafana dashboard
Production docker-compose

DEPENDENCIAS

Runtime (al menos uno requerido)

Ollama: Backend CPU para desarrollo
vLLM: Backend GPU para produccion

Opcionales

PostgreSQL 16 (metricas, DB: local_llm_dev)
Redis (cache, DB: 9)
Prometheus (monitoring)
Grafana (dashboard)

GPU (solo para vLLM)

NVIDIA CUDA >= 12.6
NVIDIA Container Toolkit

HERENCIA Y RELACIONES

workspace-v2/orchestration/
         |
         v
  local-llm-agent (STANDALONE)
         |
         v
  [Sirve a todos los proyectos via API]

Tipo: CONSUMER (L2) - STANDALONE Hereda de: workspace-v2/orchestration/ (solo directivas) Exporta a: Ninguno (es servicio, no biblioteca) Consumidores: Todos los proyectos via API

RUTAS IMPORTANTES

Ruta	Descripcion
`apps/gateway/`	Gateway NestJS
`apps/inference-engine/`	Inference Engine Python
`config/`	Configuracion compartida
`docs/`	Documentacion del proyecto
`orchestration/`	Gobernanza SIMCO

CONTACTO Y EQUIPO

Owner: ISEM Development
Agentes principales: Claude Code, Trae

NOTAS

Proyecto STANDALONE de infraestructura
Sirve a todos los proyectos del workspace via API
No forma parte de la jerarquia ERP
Phase 3 complete - Production ready
GPU setup requiere WSL con NVIDIA drivers

CONTEXTO-PROYECTO.md Local LLM Agent v1.0.0 - Sistema SIMCO v4.3.0

4.6 KiB Raw Blame History