# Plan de Desarrollo - Local LLM Agent

**Version:** 1.0.0
**Fecha:** 2026-01-20
**Proyecto:** local-llm-agent
**Prioridad:** P1 (Infraestructura)

---

## 1. RESUMEN EJECUTIVO

### 1.1 Estado Actual

| Aspecto | Estado |
|---------|--------|
| Infraestructura base | 60% |
| Inference Engine (Python) | 70% |
| Gateway (NestJS) | 30% |
| MCP Tools | 0% |
| Tests | 5% |
| Documentacion | 40% |
| **Global** | **35%** |

### 1.2 Roadmap de Fases

```
     Fase 1 (MVP)           Fase 2 (Multi-Tool)      Fase 3 (Produccion)
    ─────────────────      ─────────────────────    ────────────────────
    [Gateway basico]  ───>  [MCP Tools]       ───>  [vLLM Backend]
    [Ollama backend]        [Tier Router]           [Multi-LoRA]
    [Health checks]         [Rate limiting]         [Continuous batching]
    [Chat completion]       [Basic auth]            [Project detection]
                            [Metrics]               [Production deploy]

         2 semanas              3 semanas               4 semanas
```

---

## 2. FASE 1: MVP (Minimum Viable Product)

### 2.1 Objetivo

Entregar un gateway funcional que permita a los agentes del workspace delegar tareas de chat completion a un LLM local via Ollama.

### 2.2 Entregables

| ID | Entregable | Descripcion | Prioridad |
|----|------------|-------------|-----------|
| F1-01 | Gateway NestJS basico | Estructura de proyecto, modulos base | MUST |
| F1-02 | Endpoint /v1/chat/completions | Chat completion OpenAI-compatible | MUST |
| F1-03 | Endpoint /v1/models | Lista de modelos | MUST |
| F1-04 | Endpoint /health | Health check | MUST |
| F1-05 | Inference Engine completo | Backend Python con Ollama | MUST |
| F1-06 | Docker setup | docker-compose funcional | MUST |
| F1-07 | Tests basicos | Unit tests criticos | SHOULD |
| F1-08 | Documentacion MVP | README, setup guide | SHOULD |

### 2.3 Tareas Detalladas

#### F1-01: Gateway NestJS basico

```yaml
tarea: F1-01
nombre: Gateway NestJS basico
duracion_estimada: 2 dias
dependencias: []
asignar_a: "@PERFIL_BACKEND"

subtareas:
  - id: F1-01-A
    nombre: Crear estructura de proyecto NestJS
    archivos:
      - apps/gateway/src/main.ts
      - apps/gateway/src/app.module.ts
      - apps/gateway/nest-cli.json
      - apps/gateway/tsconfig.json
    criterios:
      - NestJS 10.x configurado
      - TypeScript estricto
      - ESLint + Prettier configurados

  - id: F1-01-B
    nombre: Configurar modulos base
    archivos:
      - apps/gateway/src/modules/chat/chat.module.ts
      - apps/gateway/src/modules/models/models.module.ts
      - apps/gateway/src/modules/health/health.module.ts
      - apps/gateway/src/common/config/configuration.ts
    criterios:
      - ConfigModule con .env
      - Logger estructurado (pino)
      - CORS configurado

  - id: F1-01-C
    nombre: Crear InferenceClient service
    archivos:
      - apps/gateway/src/common/services/inference-client.service.ts
    criterios:
      - Cliente HTTP para Inference Engine
      - Manejo de timeouts
      - Retry basico
```

#### F1-02: Endpoint Chat Completions

```yaml
tarea: F1-02
nombre: Endpoint /v1/chat/completions
duracion_estimada: 2 dias
dependencias: [F1-01, F1-05]
asignar_a: "@PERFIL_BACKEND"

subtareas:
  - id: F1-02-A
    nombre: Crear DTOs
    archivos:
      - apps/gateway/src/modules/chat/dto/chat-completion-request.dto.ts
      - apps/gateway/src/modules/chat/dto/chat-completion-response.dto.ts
    criterios:
      - Validacion con class-validator
      - Schemas OpenAI-compatibles
      - Swagger decorators

  - id: F1-02-B
    nombre: Implementar ChatController
    archivos:
      - apps/gateway/src/modules/chat/chat.controller.ts
    criterios:
      - POST /v1/chat/completions
      - Validacion de request
      - Transformacion de response

  - id: F1-02-C
    nombre: Implementar ChatService
    archivos:
      - apps/gateway/src/modules/chat/chat.service.ts
    criterios:
      - Llamada a InferenceClient
      - Manejo de errores
      - Logging de latencia
```

#### F1-03: Endpoint Models

```yaml
tarea: F1-03
nombre: Endpoint /v1/models
duracion_estimada: 0.5 dias
dependencias: [F1-01]
asignar_a: "@PERFIL_BACKEND"

subtareas:
  - id: F1-03-A
    nombre: Implementar ModelsController
    archivos:
      - apps/gateway/src/modules/models/models.controller.ts
      - apps/gateway/src/modules/models/models.service.ts
    criterios:
      - GET /v1/models
      - Cache de 60 segundos
      - Formato OpenAI
```

#### F1-04: Endpoint Health

```yaml
tarea: F1-04
nombre: Endpoint /health
duracion_estimada: 0.5 dias
dependencias: [F1-01]
asignar_a: "@PERFIL_BACKEND"

subtareas:
  - id: F1-04-A
    nombre: Implementar HealthController
    archivos:
      - apps/gateway/src/modules/health/health.controller.ts
      - apps/gateway/src/modules/health/health.service.ts
    criterios:
      - GET /health
      - Verifica Inference Engine
      - Verifica Ollama (via IE)
      - Retorna status detallado
```

#### F1-05: Inference Engine completo

```yaml
tarea: F1-05
nombre: Inference Engine Python completo
duracion_estimada: 1 dia
dependencias: []
asignar_a: "@PERFIL_BACKEND" (Python)

subtareas:
  - id: F1-05-A
    nombre: Completar rutas faltantes
    archivos:
      - apps/inference-engine/src/routes/health.py (revisar)
      - apps/inference-engine/src/routes/models.py (revisar)
    criterios:
      - Health check completo
      - Models list formateado
      - Error handling consistente

  - id: F1-05-B
    nombre: Mejorar manejo de errores
    archivos:
      - apps/inference-engine/src/engine/ollama_backend.py
    criterios:
      - Timeouts configurables
      - Retry con backoff
      - Logging detallado

estado_actual: 70% completado
```

#### F1-06: Docker Setup

```yaml
tarea: F1-06
nombre: Docker Setup
duracion_estimada: 0.5 dias
dependencias: [F1-01, F1-05]
asignar_a: "@PERFIL_DEVOPS"

subtareas:
  - id: F1-06-A
    nombre: Completar Dockerfiles
    archivos:
      - apps/gateway/Dockerfile
      - apps/inference-engine/Dockerfile
    criterios:
      - Multi-stage builds
      - Imagen optimizada
      - Non-root user

  - id: F1-06-B
    nombre: Validar docker-compose
    archivos:
      - docker-compose.yml
    criterios:
      - Redes configuradas
      - Volumes para desarrollo
      - Health checks funcionales

estado_actual: 80% completado
```

#### F1-07: Tests basicos

```yaml
tarea: F1-07
nombre: Tests basicos
duracion_estimada: 1 dia
dependencias: [F1-02, F1-03, F1-04]
asignar_a: "@PERFIL_TESTING"

subtareas:
  - id: F1-07-A
    nombre: Unit tests Gateway
    archivos:
      - apps/gateway/test/chat.service.spec.ts
      - apps/gateway/test/models.service.spec.ts
    criterios:
      - Mock de InferenceClient
      - Casos de exito y error
      - Coverage > 50%

  - id: F1-07-B
    nombre: Unit tests Inference Engine
    archivos:
      - apps/inference-engine/tests/test_chat.py
      - apps/inference-engine/tests/test_backend.py
    criterios:
      - Mock de Ollama
      - Pytest configurado
      - Coverage > 50%
```

### 2.4 Criterios de Aceptacion Fase 1

| Criterio | Verificacion |
|----------|--------------|
| Chat completion funcional | curl POST /v1/chat/completions retorna respuesta |
| Models list funcional | curl GET /v1/models retorna lista |
| Health check funcional | curl GET /health retorna status |
| Docker funcional | docker-compose up levanta servicios |
| SDK OpenAI compatible | Script Python con openai SDK funciona |
| Latencia aceptable | p95 < 3000ms para tier small |

---

## 3. FASE 2: Multi-Tool & Features

### 3.1 Objetivo

Agregar herramientas MCP especializadas, clasificacion de tiers, rate limiting basico y metricas.

### 3.2 Entregables

| ID | Entregable | Descripcion | Prioridad |
|----|------------|-------------|-----------|
| F2-01 | MCP Tools Module | Endpoints y logica de MCP tools | SHOULD |
| F2-02 | Tool: Classify | Clasificacion de texto | SHOULD |
| F2-03 | Tool: Extract | Extraccion de datos | SHOULD |
| F2-04 | Tool: Summarize | Resumen de texto | SHOULD |
| F2-05 | Tool: Rewrite | Reescritura de texto | SHOULD |
| F2-06 | Tier Router | Clasificacion small/main | SHOULD |
| F2-07 | Rate Limiting | Limites por IP/tier | NICE |
| F2-08 | Basic Auth | API Key simple | NICE |
| F2-09 | Metrics | Prometheus metrics | NICE |

### 3.3 Tareas Detalladas

#### F2-01: MCP Tools Module

```yaml
tarea: F2-01
nombre: MCP Tools Module
duracion_estimada: 1 dia
dependencias: [Fase 1 completa]
asignar_a: "@PERFIL_BACKEND"

subtareas:
  - id: F2-01-A
    nombre: Crear modulo MCP
    archivos:
      - apps/gateway/src/modules/mcp-tools/mcp-tools.module.ts
      - apps/gateway/src/modules/mcp-tools/mcp-tools.controller.ts
      - apps/gateway/src/modules/mcp-tools/mcp-tools.service.ts
      - apps/gateway/src/modules/mcp-tools/tools-registry.ts

  - id: F2-01-B
    nombre: Crear DTOs base
    archivos:
      - apps/gateway/src/modules/mcp-tools/dto/tool-request.dto.ts
      - apps/gateway/src/modules/mcp-tools/dto/tool-response.dto.ts
```

#### F2-02 a F2-05: Herramientas MCP

```yaml
tareas: [F2-02, F2-03, F2-04, F2-05]
nombre: Herramientas MCP (classify, extract, summarize, rewrite)
duracion_estimada: 2 dias (todas)
dependencias: [F2-01]
asignar_a: "@PERFIL_BACKEND"

estructura:
  - apps/gateway/src/modules/mcp-tools/tools/classify.tool.ts
  - apps/gateway/src/modules/mcp-tools/tools/extract.tool.ts
  - apps/gateway/src/modules/mcp-tools/tools/summarize.tool.ts
  - apps/gateway/src/modules/mcp-tools/tools/rewrite.tool.ts

implementacion:
  - Cada tool define su schema de parametros
  - Cada tool genera prompt optimizado
  - Cada tool parsea respuesta del LLM
  - Todas usan tier "small" por defecto
```

#### F2-06: Tier Router

```yaml
tarea: F2-06
nombre: Tier Router
duracion_estimada: 1 dia
dependencias: [Fase 1 completa]
asignar_a: "@PERFIL_BACKEND"

subtareas:
  - id: F2-06-A
    nombre: Implementar TierService
    archivos:
      - apps/gateway/src/common/services/tier.service.ts
    logica:
      - Estimar tokens de request
      - Clasificar en small/main
      - Aplicar limites de tier

  - id: F2-06-B
    nombre: Integrar en ChatController
    criterios:
      - Clasificacion automatica
      - Respeto de header X-Tier
      - Log de tier usado
```

### 3.4 Criterios de Aceptacion Fase 2

| Criterio | Verificacion |
|----------|--------------|
| MCP tools listados | GET /mcp/tools retorna 4 tools |
| Classify funcional | POST /mcp/tools/classify clasifica correctamente |
| Tier routing funcional | Requests grandes usan tier main |
| Rate limiting funcional | Requests excesivas retornan 429 |

---

## 4. FASE 3: Produccion

### 4.1 Objetivo

Preparar el sistema para uso en produccion con vLLM, Multi-LoRA y capacidades avanzadas.

### 4.2 Entregables

| ID | Entregable | Descripcion | Prioridad |
|----|------------|-------------|-----------|
| F3-01 | vLLM Backend | Backend alternativo en WSL | NICE |
| F3-02 | Multi-LoRA | Soporte multiples adaptadores | NICE |
| F3-03 | Continuous Batching | Batching de requests | NICE |
| F3-04 | Project Detection | Deteccion automatica de proyecto | NICE |
| F3-05 | Production Deploy | Configuracion de produccion | NICE |
| F3-06 | Monitoring | Dashboard Grafana | NICE |

### 4.3 Notas

Fase 3 se planificara en detalle despues de completar Fase 2.

Requiere:
- Configuracion de WSL con CUDA
- Instalacion de vLLM
- Training de LoRA adapters

---

## 5. TIMELINE

### 5.1 Calendario Propuesto

```
Semana 1: F1-01 a F1-04 (Gateway + Endpoints)
Semana 2: F1-05 a F1-08 (Inference Engine + Tests + Docs)
          ─── ENTREGA MVP ───
Semana 3: F2-01 a F2-05 (MCP Tools)
Semana 4: F2-06 a F2-09 (Router + Rate Limiting + Metrics)
          ─── ENTREGA FASE 2 ───
Semana 5-8: Fase 3 (segun disponibilidad)
```

### 5.2 Dependencias Criticas

```
[Ollama instalado] ─────────────────────────────────────┐
                                                        │
[F1-05: Inference Engine] ──> [F1-02: Chat Endpoint] ──>├──> [MVP]
                                                        │
[F1-01: Gateway base] ──> [F1-03, F1-04: Endpoints] ────┘

[MVP] ──> [F2-01: MCP Module] ──> [F2-02..05: Tools] ──> [Fase 2]
```

---

## 6. RIESGOS Y MITIGACIONES

| Riesgo | Probabilidad | Impacto | Mitigacion |
|--------|--------------|---------|------------|
| Ollama no soporta modelo | Baja | Alto | Probar modelo antes de iniciar |
| VRAM insuficiente | Media | Alto | Usar quantizacion Q4, reducir batch |
| Latencia alta | Media | Medio | Optimizar prompts, usar tier small |
| Incompatibilidad OpenAI | Baja | Alto | Tests con SDK oficial |

---

## 7. RECURSOS REQUERIDOS

### 7.1 Humanos

| Perfil | Dedicacion | Tareas |
|--------|------------|--------|
| @PERFIL_BACKEND | 70% | Gateway, Inference Engine |
| @PERFIL_DEVOPS | 20% | Docker, deploy |
| @PERFIL_TESTING | 10% | Tests unitarios |

### 7.2 Tecnicos

| Recurso | Especificacion |
|---------|---------------|
| GPU | RTX 5060 Ti 16GB (existente) |
| RAM | 32GB minimo |
| Storage | 50GB para modelos |
| Ollama | Version >= 0.1.0 |

---

## 8. METRICAS DE EXITO

| Metrica | Objetivo MVP | Objetivo Fase 2 |
|---------|--------------|-----------------|
| Uptime | 90% | 95% |
| Latencia p95 (small) | 1000ms | 500ms |
| Latencia p95 (main) | 3000ms | 2000ms |
| Reduccion tokens externos | 20% | 30% |
| Cobertura tests | 50% | 70% |

---

## 9. PROXIMOS PASOS

1. **Inmediato:** Completar Gateway NestJS (F1-01)
2. **Esta semana:** Conectar Gateway con Inference Engine (F1-02)
3. **Siguiente semana:** Tests y documentacion MVP

---

**Documento Controlado**
- Autor: Requirements-Analyst Agent
- Fecha: 2026-01-20
- Revisor: Architecture-Analyst Agent