514 lines
14 KiB
Markdown
514 lines
14 KiB
Markdown
# Plan de Desarrollo - Local LLM Agent
|
|
|
|
**Version:** 1.0.0
|
|
**Fecha:** 2026-01-20
|
|
**Proyecto:** local-llm-agent
|
|
**Prioridad:** P1 (Infraestructura)
|
|
|
|
---
|
|
|
|
## 1. RESUMEN EJECUTIVO
|
|
|
|
### 1.1 Estado Actual
|
|
|
|
| Aspecto | Estado |
|
|
|---------|--------|
|
|
| Infraestructura base | 60% |
|
|
| Inference Engine (Python) | 70% |
|
|
| Gateway (NestJS) | 30% |
|
|
| MCP Tools | 0% |
|
|
| Tests | 5% |
|
|
| Documentacion | 40% |
|
|
| **Global** | **35%** |
|
|
|
|
### 1.2 Roadmap de Fases
|
|
|
|
```
|
|
Fase 1 (MVP) Fase 2 (Multi-Tool) Fase 3 (Produccion)
|
|
───────────────── ───────────────────── ────────────────────
|
|
[Gateway basico] ───> [MCP Tools] ───> [vLLM Backend]
|
|
[Ollama backend] [Tier Router] [Multi-LoRA]
|
|
[Health checks] [Rate limiting] [Continuous batching]
|
|
[Chat completion] [Basic auth] [Project detection]
|
|
[Metrics] [Production deploy]
|
|
|
|
2 semanas 3 semanas 4 semanas
|
|
```
|
|
|
|
---
|
|
|
|
## 2. FASE 1: MVP (Minimum Viable Product)
|
|
|
|
### 2.1 Objetivo
|
|
|
|
Entregar un gateway funcional que permita a los agentes del workspace delegar tareas de chat completion a un LLM local via Ollama.
|
|
|
|
### 2.2 Entregables
|
|
|
|
| ID | Entregable | Descripcion | Prioridad |
|
|
|----|------------|-------------|-----------|
|
|
| F1-01 | Gateway NestJS basico | Estructura de proyecto, modulos base | MUST |
|
|
| F1-02 | Endpoint /v1/chat/completions | Chat completion OpenAI-compatible | MUST |
|
|
| F1-03 | Endpoint /v1/models | Lista de modelos | MUST |
|
|
| F1-04 | Endpoint /health | Health check | MUST |
|
|
| F1-05 | Inference Engine completo | Backend Python con Ollama | MUST |
|
|
| F1-06 | Docker setup | docker-compose funcional | MUST |
|
|
| F1-07 | Tests basicos | Unit tests criticos | SHOULD |
|
|
| F1-08 | Documentacion MVP | README, setup guide | SHOULD |
|
|
|
|
### 2.3 Tareas Detalladas
|
|
|
|
#### F1-01: Gateway NestJS basico
|
|
|
|
```yaml
|
|
tarea: F1-01
|
|
nombre: Gateway NestJS basico
|
|
duracion_estimada: 2 dias
|
|
dependencias: []
|
|
asignar_a: "@PERFIL_BACKEND"
|
|
|
|
subtareas:
|
|
- id: F1-01-A
|
|
nombre: Crear estructura de proyecto NestJS
|
|
archivos:
|
|
- apps/gateway/src/main.ts
|
|
- apps/gateway/src/app.module.ts
|
|
- apps/gateway/nest-cli.json
|
|
- apps/gateway/tsconfig.json
|
|
criterios:
|
|
- NestJS 10.x configurado
|
|
- TypeScript estricto
|
|
- ESLint + Prettier configurados
|
|
|
|
- id: F1-01-B
|
|
nombre: Configurar modulos base
|
|
archivos:
|
|
- apps/gateway/src/modules/chat/chat.module.ts
|
|
- apps/gateway/src/modules/models/models.module.ts
|
|
- apps/gateway/src/modules/health/health.module.ts
|
|
- apps/gateway/src/common/config/configuration.ts
|
|
criterios:
|
|
- ConfigModule con .env
|
|
- Logger estructurado (pino)
|
|
- CORS configurado
|
|
|
|
- id: F1-01-C
|
|
nombre: Crear InferenceClient service
|
|
archivos:
|
|
- apps/gateway/src/common/services/inference-client.service.ts
|
|
criterios:
|
|
- Cliente HTTP para Inference Engine
|
|
- Manejo de timeouts
|
|
- Retry basico
|
|
```
|
|
|
|
#### F1-02: Endpoint Chat Completions
|
|
|
|
```yaml
|
|
tarea: F1-02
|
|
nombre: Endpoint /v1/chat/completions
|
|
duracion_estimada: 2 dias
|
|
dependencias: [F1-01, F1-05]
|
|
asignar_a: "@PERFIL_BACKEND"
|
|
|
|
subtareas:
|
|
- id: F1-02-A
|
|
nombre: Crear DTOs
|
|
archivos:
|
|
- apps/gateway/src/modules/chat/dto/chat-completion-request.dto.ts
|
|
- apps/gateway/src/modules/chat/dto/chat-completion-response.dto.ts
|
|
criterios:
|
|
- Validacion con class-validator
|
|
- Schemas OpenAI-compatibles
|
|
- Swagger decorators
|
|
|
|
- id: F1-02-B
|
|
nombre: Implementar ChatController
|
|
archivos:
|
|
- apps/gateway/src/modules/chat/chat.controller.ts
|
|
criterios:
|
|
- POST /v1/chat/completions
|
|
- Validacion de request
|
|
- Transformacion de response
|
|
|
|
- id: F1-02-C
|
|
nombre: Implementar ChatService
|
|
archivos:
|
|
- apps/gateway/src/modules/chat/chat.service.ts
|
|
criterios:
|
|
- Llamada a InferenceClient
|
|
- Manejo de errores
|
|
- Logging de latencia
|
|
```
|
|
|
|
#### F1-03: Endpoint Models
|
|
|
|
```yaml
|
|
tarea: F1-03
|
|
nombre: Endpoint /v1/models
|
|
duracion_estimada: 0.5 dias
|
|
dependencias: [F1-01]
|
|
asignar_a: "@PERFIL_BACKEND"
|
|
|
|
subtareas:
|
|
- id: F1-03-A
|
|
nombre: Implementar ModelsController
|
|
archivos:
|
|
- apps/gateway/src/modules/models/models.controller.ts
|
|
- apps/gateway/src/modules/models/models.service.ts
|
|
criterios:
|
|
- GET /v1/models
|
|
- Cache de 60 segundos
|
|
- Formato OpenAI
|
|
```
|
|
|
|
#### F1-04: Endpoint Health
|
|
|
|
```yaml
|
|
tarea: F1-04
|
|
nombre: Endpoint /health
|
|
duracion_estimada: 0.5 dias
|
|
dependencias: [F1-01]
|
|
asignar_a: "@PERFIL_BACKEND"
|
|
|
|
subtareas:
|
|
- id: F1-04-A
|
|
nombre: Implementar HealthController
|
|
archivos:
|
|
- apps/gateway/src/modules/health/health.controller.ts
|
|
- apps/gateway/src/modules/health/health.service.ts
|
|
criterios:
|
|
- GET /health
|
|
- Verifica Inference Engine
|
|
- Verifica Ollama (via IE)
|
|
- Retorna status detallado
|
|
```
|
|
|
|
#### F1-05: Inference Engine completo
|
|
|
|
```yaml
|
|
tarea: F1-05
|
|
nombre: Inference Engine Python completo
|
|
duracion_estimada: 1 dia
|
|
dependencias: []
|
|
asignar_a: "@PERFIL_BACKEND" (Python)
|
|
|
|
subtareas:
|
|
- id: F1-05-A
|
|
nombre: Completar rutas faltantes
|
|
archivos:
|
|
- apps/inference-engine/src/routes/health.py (revisar)
|
|
- apps/inference-engine/src/routes/models.py (revisar)
|
|
criterios:
|
|
- Health check completo
|
|
- Models list formateado
|
|
- Error handling consistente
|
|
|
|
- id: F1-05-B
|
|
nombre: Mejorar manejo de errores
|
|
archivos:
|
|
- apps/inference-engine/src/engine/ollama_backend.py
|
|
criterios:
|
|
- Timeouts configurables
|
|
- Retry con backoff
|
|
- Logging detallado
|
|
|
|
estado_actual: 70% completado
|
|
```
|
|
|
|
#### F1-06: Docker Setup
|
|
|
|
```yaml
|
|
tarea: F1-06
|
|
nombre: Docker Setup
|
|
duracion_estimada: 0.5 dias
|
|
dependencias: [F1-01, F1-05]
|
|
asignar_a: "@PERFIL_DEVOPS"
|
|
|
|
subtareas:
|
|
- id: F1-06-A
|
|
nombre: Completar Dockerfiles
|
|
archivos:
|
|
- apps/gateway/Dockerfile
|
|
- apps/inference-engine/Dockerfile
|
|
criterios:
|
|
- Multi-stage builds
|
|
- Imagen optimizada
|
|
- Non-root user
|
|
|
|
- id: F1-06-B
|
|
nombre: Validar docker-compose
|
|
archivos:
|
|
- docker-compose.yml
|
|
criterios:
|
|
- Redes configuradas
|
|
- Volumes para desarrollo
|
|
- Health checks funcionales
|
|
|
|
estado_actual: 80% completado
|
|
```
|
|
|
|
#### F1-07: Tests basicos
|
|
|
|
```yaml
|
|
tarea: F1-07
|
|
nombre: Tests basicos
|
|
duracion_estimada: 1 dia
|
|
dependencias: [F1-02, F1-03, F1-04]
|
|
asignar_a: "@PERFIL_TESTING"
|
|
|
|
subtareas:
|
|
- id: F1-07-A
|
|
nombre: Unit tests Gateway
|
|
archivos:
|
|
- apps/gateway/test/chat.service.spec.ts
|
|
- apps/gateway/test/models.service.spec.ts
|
|
criterios:
|
|
- Mock de InferenceClient
|
|
- Casos de exito y error
|
|
- Coverage > 50%
|
|
|
|
- id: F1-07-B
|
|
nombre: Unit tests Inference Engine
|
|
archivos:
|
|
- apps/inference-engine/tests/test_chat.py
|
|
- apps/inference-engine/tests/test_backend.py
|
|
criterios:
|
|
- Mock de Ollama
|
|
- Pytest configurado
|
|
- Coverage > 50%
|
|
```
|
|
|
|
### 2.4 Criterios de Aceptacion Fase 1
|
|
|
|
| Criterio | Verificacion |
|
|
|----------|--------------|
|
|
| Chat completion funcional | curl POST /v1/chat/completions retorna respuesta |
|
|
| Models list funcional | curl GET /v1/models retorna lista |
|
|
| Health check funcional | curl GET /health retorna status |
|
|
| Docker funcional | docker-compose up levanta servicios |
|
|
| SDK OpenAI compatible | Script Python con openai SDK funciona |
|
|
| Latencia aceptable | p95 < 3000ms para tier small |
|
|
|
|
---
|
|
|
|
## 3. FASE 2: Multi-Tool & Features
|
|
|
|
### 3.1 Objetivo
|
|
|
|
Agregar herramientas MCP especializadas, clasificacion de tiers, rate limiting basico y metricas.
|
|
|
|
### 3.2 Entregables
|
|
|
|
| ID | Entregable | Descripcion | Prioridad |
|
|
|----|------------|-------------|-----------|
|
|
| F2-01 | MCP Tools Module | Endpoints y logica de MCP tools | SHOULD |
|
|
| F2-02 | Tool: Classify | Clasificacion de texto | SHOULD |
|
|
| F2-03 | Tool: Extract | Extraccion de datos | SHOULD |
|
|
| F2-04 | Tool: Summarize | Resumen de texto | SHOULD |
|
|
| F2-05 | Tool: Rewrite | Reescritura de texto | SHOULD |
|
|
| F2-06 | Tier Router | Clasificacion small/main | SHOULD |
|
|
| F2-07 | Rate Limiting | Limites por IP/tier | NICE |
|
|
| F2-08 | Basic Auth | API Key simple | NICE |
|
|
| F2-09 | Metrics | Prometheus metrics | NICE |
|
|
|
|
### 3.3 Tareas Detalladas
|
|
|
|
#### F2-01: MCP Tools Module
|
|
|
|
```yaml
|
|
tarea: F2-01
|
|
nombre: MCP Tools Module
|
|
duracion_estimada: 1 dia
|
|
dependencias: [Fase 1 completa]
|
|
asignar_a: "@PERFIL_BACKEND"
|
|
|
|
subtareas:
|
|
- id: F2-01-A
|
|
nombre: Crear modulo MCP
|
|
archivos:
|
|
- apps/gateway/src/modules/mcp-tools/mcp-tools.module.ts
|
|
- apps/gateway/src/modules/mcp-tools/mcp-tools.controller.ts
|
|
- apps/gateway/src/modules/mcp-tools/mcp-tools.service.ts
|
|
- apps/gateway/src/modules/mcp-tools/tools-registry.ts
|
|
|
|
- id: F2-01-B
|
|
nombre: Crear DTOs base
|
|
archivos:
|
|
- apps/gateway/src/modules/mcp-tools/dto/tool-request.dto.ts
|
|
- apps/gateway/src/modules/mcp-tools/dto/tool-response.dto.ts
|
|
```
|
|
|
|
#### F2-02 a F2-05: Herramientas MCP
|
|
|
|
```yaml
|
|
tareas: [F2-02, F2-03, F2-04, F2-05]
|
|
nombre: Herramientas MCP (classify, extract, summarize, rewrite)
|
|
duracion_estimada: 2 dias (todas)
|
|
dependencias: [F2-01]
|
|
asignar_a: "@PERFIL_BACKEND"
|
|
|
|
estructura:
|
|
- apps/gateway/src/modules/mcp-tools/tools/classify.tool.ts
|
|
- apps/gateway/src/modules/mcp-tools/tools/extract.tool.ts
|
|
- apps/gateway/src/modules/mcp-tools/tools/summarize.tool.ts
|
|
- apps/gateway/src/modules/mcp-tools/tools/rewrite.tool.ts
|
|
|
|
implementacion:
|
|
- Cada tool define su schema de parametros
|
|
- Cada tool genera prompt optimizado
|
|
- Cada tool parsea respuesta del LLM
|
|
- Todas usan tier "small" por defecto
|
|
```
|
|
|
|
#### F2-06: Tier Router
|
|
|
|
```yaml
|
|
tarea: F2-06
|
|
nombre: Tier Router
|
|
duracion_estimada: 1 dia
|
|
dependencias: [Fase 1 completa]
|
|
asignar_a: "@PERFIL_BACKEND"
|
|
|
|
subtareas:
|
|
- id: F2-06-A
|
|
nombre: Implementar TierService
|
|
archivos:
|
|
- apps/gateway/src/common/services/tier.service.ts
|
|
logica:
|
|
- Estimar tokens de request
|
|
- Clasificar en small/main
|
|
- Aplicar limites de tier
|
|
|
|
- id: F2-06-B
|
|
nombre: Integrar en ChatController
|
|
criterios:
|
|
- Clasificacion automatica
|
|
- Respeto de header X-Tier
|
|
- Log de tier usado
|
|
```
|
|
|
|
### 3.4 Criterios de Aceptacion Fase 2
|
|
|
|
| Criterio | Verificacion |
|
|
|----------|--------------|
|
|
| MCP tools listados | GET /mcp/tools retorna 4 tools |
|
|
| Classify funcional | POST /mcp/tools/classify clasifica correctamente |
|
|
| Tier routing funcional | Requests grandes usan tier main |
|
|
| Rate limiting funcional | Requests excesivas retornan 429 |
|
|
|
|
---
|
|
|
|
## 4. FASE 3: Produccion
|
|
|
|
### 4.1 Objetivo
|
|
|
|
Preparar el sistema para uso en produccion con vLLM, Multi-LoRA y capacidades avanzadas.
|
|
|
|
### 4.2 Entregables
|
|
|
|
| ID | Entregable | Descripcion | Prioridad |
|
|
|----|------------|-------------|-----------|
|
|
| F3-01 | vLLM Backend | Backend alternativo en WSL | NICE |
|
|
| F3-02 | Multi-LoRA | Soporte multiples adaptadores | NICE |
|
|
| F3-03 | Continuous Batching | Batching de requests | NICE |
|
|
| F3-04 | Project Detection | Deteccion automatica de proyecto | NICE |
|
|
| F3-05 | Production Deploy | Configuracion de produccion | NICE |
|
|
| F3-06 | Monitoring | Dashboard Grafana | NICE |
|
|
|
|
### 4.3 Notas
|
|
|
|
Fase 3 se planificara en detalle despues de completar Fase 2.
|
|
|
|
Requiere:
|
|
- Configuracion de WSL con CUDA
|
|
- Instalacion de vLLM
|
|
- Training de LoRA adapters
|
|
|
|
---
|
|
|
|
## 5. TIMELINE
|
|
|
|
### 5.1 Calendario Propuesto
|
|
|
|
```
|
|
Semana 1: F1-01 a F1-04 (Gateway + Endpoints)
|
|
Semana 2: F1-05 a F1-08 (Inference Engine + Tests + Docs)
|
|
─── ENTREGA MVP ───
|
|
Semana 3: F2-01 a F2-05 (MCP Tools)
|
|
Semana 4: F2-06 a F2-09 (Router + Rate Limiting + Metrics)
|
|
─── ENTREGA FASE 2 ───
|
|
Semana 5-8: Fase 3 (segun disponibilidad)
|
|
```
|
|
|
|
### 5.2 Dependencias Criticas
|
|
|
|
```
|
|
[Ollama instalado] ─────────────────────────────────────┐
|
|
│
|
|
[F1-05: Inference Engine] ──> [F1-02: Chat Endpoint] ──>├──> [MVP]
|
|
│
|
|
[F1-01: Gateway base] ──> [F1-03, F1-04: Endpoints] ────┘
|
|
|
|
[MVP] ──> [F2-01: MCP Module] ──> [F2-02..05: Tools] ──> [Fase 2]
|
|
```
|
|
|
|
---
|
|
|
|
## 6. RIESGOS Y MITIGACIONES
|
|
|
|
| Riesgo | Probabilidad | Impacto | Mitigacion |
|
|
|--------|--------------|---------|------------|
|
|
| Ollama no soporta modelo | Baja | Alto | Probar modelo antes de iniciar |
|
|
| VRAM insuficiente | Media | Alto | Usar quantizacion Q4, reducir batch |
|
|
| Latencia alta | Media | Medio | Optimizar prompts, usar tier small |
|
|
| Incompatibilidad OpenAI | Baja | Alto | Tests con SDK oficial |
|
|
|
|
---
|
|
|
|
## 7. RECURSOS REQUERIDOS
|
|
|
|
### 7.1 Humanos
|
|
|
|
| Perfil | Dedicacion | Tareas |
|
|
|--------|------------|--------|
|
|
| @PERFIL_BACKEND | 70% | Gateway, Inference Engine |
|
|
| @PERFIL_DEVOPS | 20% | Docker, deploy |
|
|
| @PERFIL_TESTING | 10% | Tests unitarios |
|
|
|
|
### 7.2 Tecnicos
|
|
|
|
| Recurso | Especificacion |
|
|
|---------|---------------|
|
|
| GPU | RTX 5060 Ti 16GB (existente) |
|
|
| RAM | 32GB minimo |
|
|
| Storage | 50GB para modelos |
|
|
| Ollama | Version >= 0.1.0 |
|
|
|
|
---
|
|
|
|
## 8. METRICAS DE EXITO
|
|
|
|
| Metrica | Objetivo MVP | Objetivo Fase 2 |
|
|
|---------|--------------|-----------------|
|
|
| Uptime | 90% | 95% |
|
|
| Latencia p95 (small) | 1000ms | 500ms |
|
|
| Latencia p95 (main) | 3000ms | 2000ms |
|
|
| Reduccion tokens externos | 20% | 30% |
|
|
| Cobertura tests | 50% | 70% |
|
|
|
|
---
|
|
|
|
## 9. PROXIMOS PASOS
|
|
|
|
1. **Inmediato:** Completar Gateway NestJS (F1-01)
|
|
2. **Esta semana:** Conectar Gateway con Inference Engine (F1-02)
|
|
3. **Siguiente semana:** Tests y documentacion MVP
|
|
|
|
---
|
|
|
|
**Documento Controlado**
|
|
- Autor: Requirements-Analyst Agent
|
|
- Fecha: 2026-01-20
|
|
- Revisor: Architecture-Analyst Agent
|