Adrian Flores Cortes 3def230d58 Initial commit: local-llm-agent infrastructure project

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-02 16:42:45 -06:00

5.0 KiB

Raw Permalink Blame History

id	title	type	status	priority	version	created_date	updated_date
VIS-LLM-001	Vision General Local LLM Agent	Overview	Published	P1	1.0.0	2026-01-24	2026-01-24

Local LLM Agent - Vision General

Proyecto: local-llm-agent Tipo: UTILITY (herramienta de soporte) Version: 1.0.0

Proposito

Local LLM Agent es un gateway de inferencia LLM local que permite a los agentes del workspace (Claude Code, Trae, Gemini) delegar tareas simples para:

Optimizar contexto - Reducir uso de tokens en modelos principales
Reducir costos - Tareas simples se ejecutan localmente (gratis)
Mejorar latencia - Respuestas instantaneas para operaciones comunes
Habilitar offline - Funcionar sin conexion a APIs externas

Casos de Uso

Tareas Delegables (Small Tier)

Tarea	Ejemplo	Tokens Max
Clasificacion	"Este archivo es DDL o Backend?"	512
Extraccion simple	"Extrae el nombre de la funcion"	512
Validacion sintaxis	"Este JSON es valido?"	512
Resumen corto	"Resume este error en 1 linea"	512

Tareas Complejas (Main Tier)

Tarea	Ejemplo	Tokens Max
Analisis de codigo	"Encuentra bugs en esta funcion"	2048
Generacion simple	"Crea un DTO para este objeto"	2048
Explicacion	"Explica que hace este query"	2048

Arquitectura de Alto Nivel

┌──────────────────────────────────────────────────┐
│ AGENTES (Claude Code, Trae, Gemini)              │
└──────────────────────┬───────────────────────────┘
                       │ HTTP (puerto 3160)
                       ▼
┌──────────────────────────────────────────────────┐
│ LOCAL-LLM-AGENT                                  │
│ ┌──────────────────────────────────────────────┐ │
│ │ API Gateway (NestJS)                         │ │
│ │ - OpenAI-compatible endpoints                │ │
│ │ - MCP Tools endpoints                        │ │
│ │ - Tier routing (small/main)                  │ │
│ └──────────────────────┬───────────────────────┘ │
│                        ▼                         │
│ ┌──────────────────────────────────────────────┐ │
│ │ Inference Engine (Python FastAPI)            │ │
│ │ - Ollama backend (MVP)                       │ │
│ │ - vLLM backend (futuro)                      │ │
│ └──────────────────────┬───────────────────────┘ │
└──────────────────────────────────────────────────┘
                       │
                       ▼
┌──────────────────────────────────────────────────┐
│ NVIDIA RTX 5060 Ti (16GB VRAM)                   │
│ Modelo: GPT-OSS 20B Q4_K_M                       │
└──────────────────────────────────────────────────┘

Stack Tecnologico

Componente	Tecnologia	Version
API Gateway	NestJS	11.x
Inference Engine	Python FastAPI	0.100+
LLM Backend	Ollama	Latest
Modelo	GPT-OSS 20B Q4_K_M	-
Hardware	NVIDIA RTX 5060 Ti	16GB VRAM

Puertos

Servicio	Puerto	Descripcion
API Gateway	3160	Punto de entrada para agentes
Inference Engine	3161	Motor de inferencia interno
Ollama	11434	Backend de modelos

Estado Actual

Componente	Estado
API Gateway	Planificado
Inference Engine	Planificado
Ollama Integration	Planificado
vLLM Integration	Futuro

Beneficios Esperados

Reduccion de costos - 60-80% menos tokens en APIs externas
Mejor latencia - < 500ms para tareas small tier
Mayor privacidad - Codigo sensible no sale a APIs externas
Disponibilidad - Funciona sin conexion a internet

Referencias

Arquitectura tecnica: ARQUITECTURA-LOCAL-LLM.md
Proyecto: ../../README.md
Inventarios: ../../orchestration/inventarios/

Creado: 2026-01-24 Actualizado: 2026-01-24

5.0 KiB Raw Permalink Blame History