local-llm-agent/docs/70-onboarding/WSL-GPU-SETUP.md
Adrian Flores Cortes 3def230d58 Initial commit: local-llm-agent infrastructure project
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 16:42:45 -06:00

6.7 KiB

WSL GPU Setup Guide

Guide for configuring NVIDIA GPU support in WSL2 for the Local LLM Agent.

Prerequisites

Requirement Minimum Version
Windows Windows 11 (or Windows 10 21H2+)
WSL WSL2
NVIDIA Driver 525.xx or newer
GPU NVIDIA with CUDA support

Quick Setup

Run the automated setup script:

# From WSL Ubuntu-24.04
cd /mnt/c/Empresas/ISEM/workspace-v2/projects/local-llm-agent
chmod +x scripts/setup-wsl-gpu.sh
./scripts/setup-wsl-gpu.sh

Manual Setup

Step 1: Verify Windows NVIDIA Driver

On Windows, open PowerShell and run:

nvidia-smi

Expected output shows driver version >= 525.xx. If not, update from: https://www.nvidia.com/drivers

Step 2: Update WSL

# From Windows PowerShell (Admin)
wsl --update
wsl --shutdown
wsl -d Ubuntu-24.04

Step 3: Verify GPU in WSL

# From WSL
nvidia-smi

You should see your GPU listed. If not, ensure:

  • Windows NVIDIA driver is installed
  • WSL is updated
  • WSL was restarted after driver installation

Step 4: Install CUDA Toolkit

# Add NVIDIA CUDA repository
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
rm cuda-keyring_1.1-1_all.deb

# Install CUDA Toolkit 12.6
sudo apt-get update
sudo apt-get install -y cuda-toolkit-12-6

# Add to PATH
echo 'export PATH=/usr/local/cuda-12.6/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.6/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

# Verify
nvcc --version

Step 5: Install Docker

# Prerequisites
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg

# Add Docker GPG key
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

# Add repository
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# Add user to docker group
sudo usermod -aG docker $USER

# Log out and log back in, or:
newgrp docker

Step 6: Install NVIDIA Container Toolkit

# Add repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# Install
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Configure Docker
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Step 7: Verify GPU in Docker

docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu22.04 nvidia-smi

Expected output:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.xx.xx    Driver Version: 560.xx.xx    CUDA Version: 12.6               |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX XXXX       On   | 00000000:01:00.0  On  |                  N/A |
| 30%   45C    P8              15W / 200W |    1234MiB /  8192MiB |      0%      Default |
+-----------------------------------------+------------------------+----------------------+

Troubleshooting

GPU not visible in WSL

  1. Update Windows NVIDIA driver

  2. Update WSL

    wsl --update
    wsl --shutdown
    
  3. Check WSL version

    wsl -l -v
    

    Ensure Ubuntu-24.04 shows VERSION 2

Docker can't access GPU

  1. Restart Docker

    sudo systemctl restart docker
    
  2. Reconfigure NVIDIA runtime

    sudo nvidia-ctk runtime configure --runtime=docker
    sudo systemctl restart docker
    
  3. Check Docker daemon config

    cat /etc/docker/daemon.json
    

    Should contain:

    {
        "runtimes": {
            "nvidia": {
                "path": "nvidia-container-runtime",
                "runtimeArgs": []
            }
        }
    }
    

Out of Memory (OOM) errors

  1. Check GPU memory

    nvidia-smi
    
  2. Free up GPU memory

    • Close other GPU applications
    • Reduce model size or batch size
  3. Configure WSL memory limit Create/edit %UserProfile%\.wslconfig:

    [wsl2]
    memory=16GB
    processors=8
    gpuSupport=true
    

CUDA version mismatch

Ensure CUDA toolkit version matches driver support:

Driver Version Max CUDA Version
>= 560.x CUDA 12.6
>= 545.x CUDA 12.3
>= 525.x CUDA 12.0

Hardware Requirements

Minimum (Development)

  • GPU: NVIDIA GTX 1060 6GB
  • VRAM: 6GB
  • Models: TinyLlama, Phi-2
  • GPU: NVIDIA RTX 3090 / RTX 4090 / A100
  • VRAM: 24GB+
  • Models: Llama-2-7B, Mistral-7B, CodeLlama-7B

Model VRAM Requirements

Model Parameters Approx VRAM (FP16)
TinyLlama 1.1B ~2GB
Phi-2 2.7B ~6GB
Llama-2-7B 7B ~14GB
Mistral-7B 7B ~14GB
CodeLlama-13B 13B ~26GB

Next Steps

After completing GPU setup:

  1. Start the vLLM stack:

    docker-compose -f docker-compose.vllm.yml up -d
    
  2. Verify vLLM health:

    curl http://localhost:8000/health
    
  3. Test inference:

    curl http://localhost:3160/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{"model":"mistral","messages":[{"role":"user","content":"Hello"}]}'
    

References