Docker GPU PyTorch Tutorial 2024 - Complete Setup Guide

Learn how to create a Dockerfile that enables PyTorch with NVIDIA GPU support for deep learning workloads

Quick Navigation

Difficulty: 🟡 Intermediate
Estimated Time: 15-20 minutes
Prerequisites: Docker installed, NVIDIA Docker toolkit, Basic Docker knowledge, NVIDIA GPU with drivers

What You'll Learn

This tutorial covers essential GPU-enabled Docker concepts and tools:

  • NVIDIA Docker Setup - Installing and configuring NVIDIA Docker toolkit
  • GPU-Enabled Dockerfile - Creating containers with CUDA support
  • PyTorch Integration - Installing and configuring PyTorch for GPU workloads
  • Testing and Validation - Verifying GPU access and performance
  • Best Practices - Production-ready GPU container strategies

Prerequisites

  • Docker installed and running
  • NVIDIA Docker toolkit
  • Basic Docker knowledge
  • NVIDIA GPU with drivers

Introduction

Docker is an excellent tool when you want to leverage GPU capabilities for deep learning workloads. Using a GPU-enabled Docker container becomes essential for running PyTorch efficiently. In this article, we'll walk you through how to create a Dockerfile that enables PyTorch with NVIDIA GPU support.

Prerequisites

Before we dive into creating the Dockerfile, make sure you have the following software installed:

  • Docker
  • NVIDIA Docker toolkit

You can install the NVIDIA Docker toolkit with the following command:

sudo apt-get install -y nvidia-docker2

The NVIDIA Docker toolkit will allow Docker to access your GPU, which is critical for running deep learning frameworks like PyTorch.

Writing the Dockerfile

Below is a Dockerfile that starts with the NVIDIA CUDA base image and installs all necessary dependencies for PyTorch.

Full Dockerfile

Here is the complete Dockerfile:

FROM nvidia/cuda:12.6.0-cudnn-devel-ubuntu20.04

# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive

# Install system dependencies
RUN apt-get update && \
    apt-get install -y \
        git \
        python3-pip \
        python3-dev \
        python3-opencv \
        libglib2.0-0

# Install any python packages you need
COPY requirements.txt requirements.txt

RUN python3 -m pip install -r requirements.txt

# Upgrade pip
RUN python3 -m pip install --upgrade pip

# Install PyTorch and torchvision
RUN pip3 install torch torchvision torchaudio -f https://download.pytorch.org/whl/cu111/torch_stable.html

# Set the working directory
WORKDIR /app

# Set the entrypoint
ENTRYPOINT [ "python3" ]

Step-by-Step Dockerfile Explanation

Base Image

We'll start with an NVIDIA CUDA image. This image includes CUDA and cuDNN, which are necessary to utilize NVIDIA GPUs.

FROM nvidia/cuda:12.6.0-cudnn-devel-ubuntu20.04

Set Environment Variables

Setting DEBIAN_FRONTEND to noninteractive will prevent interactive prompts during installation, making the Docker build process smoother.

ENV DEBIAN_FRONTEND=noninteractive

Install System Dependencies

Next, we need to install essential tools and Python dependencies:

RUN apt-get update && \
    apt-get install -y \
        git \
        python3-pip \
        python3-dev \
        python3-opencv \
        libglib2.0-0

Install Python Packages

We will use a requirements.txt file to install any necessary Python libraries for the project. Make sure you have this file in your project directory.

COPY requirements.txt requirements.txt
RUN python3 -m pip install -r requirements.txt

Upgrade Pip

It's a good idea to ensure that pip is up-to-date.

RUN python3 -m pip install --upgrade pip

Install PyTorch and Torchvision

Since we're using a CUDA-enabled base image, we'll install the corresponding version of PyTorch. The -f option specifies the link to find compatible wheels.

RUN pip3 install torch torchvision torchaudio -f https://download.pytorch.org/whl/cu111/torch_stable.html

Set the Working Directory

The WORKDIR command sets the working directory inside the container where your code will reside.

WORKDIR /app

Set the Entrypoint

Finally, set the default command for the container. Here, it specifies running Python scripts using Python 3.

ENTRYPOINT [ "python3" ]

Building and Running the Docker Image

Once you've saved the Dockerfile, you can build and run the Docker container as follows:

Build the Docker Image

docker build -t pytorch-gpu . -f Dockerfile

This command will create a Docker image named pytorch-gpu using the specified Dockerfile.

Run the Docker Container

Use the --gpus all flag to give the container access to all available GPUs. The -v $(pwd):/app flag mounts the current directory to /app inside the container, and the -it flag allows interactive mode.

docker run --name pytorch-container --gpus all -it --rm -v $(pwd):/app pytorch-gpu

The --rm flag ensures that the container is removed after it stops, keeping your environment clean.

Project Structure

Make sure you have the following files in your project directory:

your-project/
├── Dockerfile
├── requirements.txt
└── your-script.py

Example requirements.txt

numpy
pandas
matplotlib
scikit-learn

Testing GPU Availability

You can test if GPU support is working by running a simple PyTorch script:

import torch

# Check if CUDA is available
print(f"CUDA available: {torch.cuda.is_available()}")

# Check CUDA version
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"Number of GPUs: {torch.cuda.device_count()}")
    print(f"Current GPU: {torch.cuda.current_device()}")
    print(f"GPU name: {torch.cuda.get_device_name(0)}")

Best Practices

  1. Use specific CUDA versions - Match your host CUDA version with the container
  2. Optimize layer caching - Order Dockerfile commands from least to most frequently changing
  3. Use multi-stage builds - For production images, consider multi-stage builds to reduce size
  4. Security considerations - Run containers with minimal privileges
  5. Resource limits - Set appropriate memory and CPU limits for your workloads

Troubleshooting

Common Issues

  • GPU not accessible: Ensure NVIDIA Docker toolkit is properly installed
  • CUDA version mismatch: Verify CUDA versions between host and container
  • Memory issues: Adjust container memory limits based on your GPU memory

Useful Commands

# Check Docker GPU support
docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu20.04 nvidia-smi

# Check container GPU access
docker exec -it pytorch-container nvidia-smi

# View container logs
docker logs pytorch-container

Conclusion

Creating a Dockerfile for GPU-enabled PyTorch is straightforward with the right steps and dependencies. Docker ensures consistent environments across machines, fully utilizing NVIDIA GPUs for deep learning. You can customize the Dockerfile by adding more dependencies as needed for your project. This flexibility helps streamline workflows and scale AI experiments efficiently.


Tags: #Docker #PyTorch #NVIDIAGPU #DeepLearning #CUDA #MachineLearning #AIDevelopment #TechTutorial #SoftwareEngineering #PythonProgramming