local-llm-agent/docs/70-onboarding/WSL-GPU-SETUP.md
Adrian Flores Cortes 3def230d58 Initial commit: local-llm-agent infrastructure project
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 16:42:45 -06:00

277 lines
6.7 KiB
Markdown

# WSL GPU Setup Guide
Guide for configuring NVIDIA GPU support in WSL2 for the Local LLM Agent.
## Prerequisites
| Requirement | Minimum Version |
|-------------|-----------------|
| Windows | Windows 11 (or Windows 10 21H2+) |
| WSL | WSL2 |
| NVIDIA Driver | 525.xx or newer |
| GPU | NVIDIA with CUDA support |
## Quick Setup
Run the automated setup script:
```bash
# From WSL Ubuntu-24.04
cd /mnt/c/Empresas/ISEM/workspace-v2/projects/local-llm-agent
chmod +x scripts/setup-wsl-gpu.sh
./scripts/setup-wsl-gpu.sh
```
## Manual Setup
### Step 1: Verify Windows NVIDIA Driver
On Windows, open PowerShell and run:
```powershell
nvidia-smi
```
Expected output shows driver version >= 525.xx. If not, update from:
https://www.nvidia.com/drivers
### Step 2: Update WSL
```powershell
# From Windows PowerShell (Admin)
wsl --update
wsl --shutdown
wsl -d Ubuntu-24.04
```
### Step 3: Verify GPU in WSL
```bash
# From WSL
nvidia-smi
```
You should see your GPU listed. If not, ensure:
- Windows NVIDIA driver is installed
- WSL is updated
- WSL was restarted after driver installation
### Step 4: Install CUDA Toolkit
```bash
# Add NVIDIA CUDA repository
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
rm cuda-keyring_1.1-1_all.deb
# Install CUDA Toolkit 12.6
sudo apt-get update
sudo apt-get install -y cuda-toolkit-12-6
# Add to PATH
echo 'export PATH=/usr/local/cuda-12.6/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.6/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
# Verify
nvcc --version
```
### Step 5: Install Docker
```bash
# Prerequisites
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg
# Add Docker GPG key
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
# Add repository
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Install Docker
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# Add user to docker group
sudo usermod -aG docker $USER
# Log out and log back in, or:
newgrp docker
```
### Step 6: Install NVIDIA Container Toolkit
```bash
# Add repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
# Install
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Configure Docker
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
```
### Step 7: Verify GPU in Docker
```bash
docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu22.04 nvidia-smi
```
Expected output:
```
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.xx.xx Driver Version: 560.xx.xx CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX XXXX On | 00000000:01:00.0 On | N/A |
| 30% 45C P8 15W / 200W | 1234MiB / 8192MiB | 0% Default |
+-----------------------------------------+------------------------+----------------------+
```
## Troubleshooting
### GPU not visible in WSL
1. **Update Windows NVIDIA driver**
- Download latest from https://www.nvidia.com/drivers
- Restart Windows
2. **Update WSL**
```powershell
wsl --update
wsl --shutdown
```
3. **Check WSL version**
```powershell
wsl -l -v
```
Ensure Ubuntu-24.04 shows VERSION 2
### Docker can't access GPU
1. **Restart Docker**
```bash
sudo systemctl restart docker
```
2. **Reconfigure NVIDIA runtime**
```bash
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
```
3. **Check Docker daemon config**
```bash
cat /etc/docker/daemon.json
```
Should contain:
```json
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
```
### Out of Memory (OOM) errors
1. **Check GPU memory**
```bash
nvidia-smi
```
2. **Free up GPU memory**
- Close other GPU applications
- Reduce model size or batch size
3. **Configure WSL memory limit**
Create/edit `%UserProfile%\.wslconfig`:
```ini
[wsl2]
memory=16GB
processors=8
gpuSupport=true
```
### CUDA version mismatch
Ensure CUDA toolkit version matches driver support:
| Driver Version | Max CUDA Version |
|----------------|------------------|
| >= 560.x | CUDA 12.6 |
| >= 545.x | CUDA 12.3 |
| >= 525.x | CUDA 12.0 |
## Hardware Requirements
### Minimum (Development)
- GPU: NVIDIA GTX 1060 6GB
- VRAM: 6GB
- Models: TinyLlama, Phi-2
### Recommended (Production)
- GPU: NVIDIA RTX 3090 / RTX 4090 / A100
- VRAM: 24GB+
- Models: Llama-2-7B, Mistral-7B, CodeLlama-7B
### Model VRAM Requirements
| Model | Parameters | Approx VRAM (FP16) |
|-------|------------|-------------------|
| TinyLlama | 1.1B | ~2GB |
| Phi-2 | 2.7B | ~6GB |
| Llama-2-7B | 7B | ~14GB |
| Mistral-7B | 7B | ~14GB |
| CodeLlama-13B | 13B | ~26GB |
## Next Steps
After completing GPU setup:
1. Start the vLLM stack:
```bash
docker-compose -f docker-compose.vllm.yml up -d
```
2. Verify vLLM health:
```bash
curl http://localhost:8000/health
```
3. Test inference:
```bash
curl http://localhost:3160/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"mistral","messages":[{"role":"user","content":"Hello"}]}'
```
## References
- [NVIDIA CUDA on WSL](https://docs.nvidia.com/cuda/wsl-user-guide/index.html)
- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
- [vLLM Documentation](https://docs.vllm.ai/)
- [Docker GPU Support](https://docs.docker.com/config/containers/resource_constraints/#gpu)