277 lines
6.7 KiB
Markdown
277 lines
6.7 KiB
Markdown
# WSL GPU Setup Guide
|
|
|
|
Guide for configuring NVIDIA GPU support in WSL2 for the Local LLM Agent.
|
|
|
|
## Prerequisites
|
|
|
|
| Requirement | Minimum Version |
|
|
|-------------|-----------------|
|
|
| Windows | Windows 11 (or Windows 10 21H2+) |
|
|
| WSL | WSL2 |
|
|
| NVIDIA Driver | 525.xx or newer |
|
|
| GPU | NVIDIA with CUDA support |
|
|
|
|
## Quick Setup
|
|
|
|
Run the automated setup script:
|
|
|
|
```bash
|
|
# From WSL Ubuntu-24.04
|
|
cd /mnt/c/Empresas/ISEM/workspace-v2/projects/local-llm-agent
|
|
chmod +x scripts/setup-wsl-gpu.sh
|
|
./scripts/setup-wsl-gpu.sh
|
|
```
|
|
|
|
## Manual Setup
|
|
|
|
### Step 1: Verify Windows NVIDIA Driver
|
|
|
|
On Windows, open PowerShell and run:
|
|
|
|
```powershell
|
|
nvidia-smi
|
|
```
|
|
|
|
Expected output shows driver version >= 525.xx. If not, update from:
|
|
https://www.nvidia.com/drivers
|
|
|
|
### Step 2: Update WSL
|
|
|
|
```powershell
|
|
# From Windows PowerShell (Admin)
|
|
wsl --update
|
|
wsl --shutdown
|
|
wsl -d Ubuntu-24.04
|
|
```
|
|
|
|
### Step 3: Verify GPU in WSL
|
|
|
|
```bash
|
|
# From WSL
|
|
nvidia-smi
|
|
```
|
|
|
|
You should see your GPU listed. If not, ensure:
|
|
- Windows NVIDIA driver is installed
|
|
- WSL is updated
|
|
- WSL was restarted after driver installation
|
|
|
|
### Step 4: Install CUDA Toolkit
|
|
|
|
```bash
|
|
# Add NVIDIA CUDA repository
|
|
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-keyring_1.1-1_all.deb
|
|
sudo dpkg -i cuda-keyring_1.1-1_all.deb
|
|
rm cuda-keyring_1.1-1_all.deb
|
|
|
|
# Install CUDA Toolkit 12.6
|
|
sudo apt-get update
|
|
sudo apt-get install -y cuda-toolkit-12-6
|
|
|
|
# Add to PATH
|
|
echo 'export PATH=/usr/local/cuda-12.6/bin:$PATH' >> ~/.bashrc
|
|
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.6/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
|
|
source ~/.bashrc
|
|
|
|
# Verify
|
|
nvcc --version
|
|
```
|
|
|
|
### Step 5: Install Docker
|
|
|
|
```bash
|
|
# Prerequisites
|
|
sudo apt-get update
|
|
sudo apt-get install -y ca-certificates curl gnupg
|
|
|
|
# Add Docker GPG key
|
|
sudo install -m 0755 -d /etc/apt/keyrings
|
|
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
|
|
sudo chmod a+r /etc/apt/keyrings/docker.gpg
|
|
|
|
# Add repository
|
|
echo \
|
|
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
|
|
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
|
|
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
|
|
|
|
# Install Docker
|
|
sudo apt-get update
|
|
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
|
|
|
|
# Add user to docker group
|
|
sudo usermod -aG docker $USER
|
|
|
|
# Log out and log back in, or:
|
|
newgrp docker
|
|
```
|
|
|
|
### Step 6: Install NVIDIA Container Toolkit
|
|
|
|
```bash
|
|
# Add repository
|
|
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
|
|
|
|
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
|
|
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
|
|
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
|
|
|
|
# Install
|
|
sudo apt-get update
|
|
sudo apt-get install -y nvidia-container-toolkit
|
|
|
|
# Configure Docker
|
|
sudo nvidia-ctk runtime configure --runtime=docker
|
|
sudo systemctl restart docker
|
|
```
|
|
|
|
### Step 7: Verify GPU in Docker
|
|
|
|
```bash
|
|
docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu22.04 nvidia-smi
|
|
```
|
|
|
|
Expected output:
|
|
```
|
|
+-----------------------------------------------------------------------------------------+
|
|
| NVIDIA-SMI 560.xx.xx Driver Version: 560.xx.xx CUDA Version: 12.6 |
|
|
|-----------------------------------------+------------------------+----------------------+
|
|
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
|
|
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
|
|
|=========================================+========================+======================|
|
|
| 0 NVIDIA GeForce RTX XXXX On | 00000000:01:00.0 On | N/A |
|
|
| 30% 45C P8 15W / 200W | 1234MiB / 8192MiB | 0% Default |
|
|
+-----------------------------------------+------------------------+----------------------+
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### GPU not visible in WSL
|
|
|
|
1. **Update Windows NVIDIA driver**
|
|
- Download latest from https://www.nvidia.com/drivers
|
|
- Restart Windows
|
|
|
|
2. **Update WSL**
|
|
```powershell
|
|
wsl --update
|
|
wsl --shutdown
|
|
```
|
|
|
|
3. **Check WSL version**
|
|
```powershell
|
|
wsl -l -v
|
|
```
|
|
Ensure Ubuntu-24.04 shows VERSION 2
|
|
|
|
### Docker can't access GPU
|
|
|
|
1. **Restart Docker**
|
|
```bash
|
|
sudo systemctl restart docker
|
|
```
|
|
|
|
2. **Reconfigure NVIDIA runtime**
|
|
```bash
|
|
sudo nvidia-ctk runtime configure --runtime=docker
|
|
sudo systemctl restart docker
|
|
```
|
|
|
|
3. **Check Docker daemon config**
|
|
```bash
|
|
cat /etc/docker/daemon.json
|
|
```
|
|
Should contain:
|
|
```json
|
|
{
|
|
"runtimes": {
|
|
"nvidia": {
|
|
"path": "nvidia-container-runtime",
|
|
"runtimeArgs": []
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Out of Memory (OOM) errors
|
|
|
|
1. **Check GPU memory**
|
|
```bash
|
|
nvidia-smi
|
|
```
|
|
|
|
2. **Free up GPU memory**
|
|
- Close other GPU applications
|
|
- Reduce model size or batch size
|
|
|
|
3. **Configure WSL memory limit**
|
|
Create/edit `%UserProfile%\.wslconfig`:
|
|
```ini
|
|
[wsl2]
|
|
memory=16GB
|
|
processors=8
|
|
gpuSupport=true
|
|
```
|
|
|
|
### CUDA version mismatch
|
|
|
|
Ensure CUDA toolkit version matches driver support:
|
|
|
|
| Driver Version | Max CUDA Version |
|
|
|----------------|------------------|
|
|
| >= 560.x | CUDA 12.6 |
|
|
| >= 545.x | CUDA 12.3 |
|
|
| >= 525.x | CUDA 12.0 |
|
|
|
|
## Hardware Requirements
|
|
|
|
### Minimum (Development)
|
|
|
|
- GPU: NVIDIA GTX 1060 6GB
|
|
- VRAM: 6GB
|
|
- Models: TinyLlama, Phi-2
|
|
|
|
### Recommended (Production)
|
|
|
|
- GPU: NVIDIA RTX 3090 / RTX 4090 / A100
|
|
- VRAM: 24GB+
|
|
- Models: Llama-2-7B, Mistral-7B, CodeLlama-7B
|
|
|
|
### Model VRAM Requirements
|
|
|
|
| Model | Parameters | Approx VRAM (FP16) |
|
|
|-------|------------|-------------------|
|
|
| TinyLlama | 1.1B | ~2GB |
|
|
| Phi-2 | 2.7B | ~6GB |
|
|
| Llama-2-7B | 7B | ~14GB |
|
|
| Mistral-7B | 7B | ~14GB |
|
|
| CodeLlama-13B | 13B | ~26GB |
|
|
|
|
## Next Steps
|
|
|
|
After completing GPU setup:
|
|
|
|
1. Start the vLLM stack:
|
|
```bash
|
|
docker-compose -f docker-compose.vllm.yml up -d
|
|
```
|
|
|
|
2. Verify vLLM health:
|
|
```bash
|
|
curl http://localhost:8000/health
|
|
```
|
|
|
|
3. Test inference:
|
|
```bash
|
|
curl http://localhost:3160/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"model":"mistral","messages":[{"role":"user","content":"Hello"}]}'
|
|
```
|
|
|
|
## References
|
|
|
|
- [NVIDIA CUDA on WSL](https://docs.nvidia.com/cuda/wsl-user-guide/index.html)
|
|
- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
|
|
- [vLLM Documentation](https://docs.vllm.ai/)
|
|
- [Docker GPU Support](https://docs.docker.com/config/containers/resource_constraints/#gpu)
|