Gpu Workloads /
Configure and optimise workloads for GPUs
You can deploy a range of pre-built Docker images to run applications and services that can take advantage of GPU acceleration, or build and deploy your own applications.
Often, workloads that require GPUs will also have greater requirements for CPU, memory, and storage, and you will also need to ensure that your Docker image and application frameworks are compatible with the GPUs you want to use.
Configure applications to use GPUs
Check your application or library's documentation to ensure it is correctly configured to utilise the GPU. You may have to install specific ROCm versions of your packages for AMD GPUs.
Below are examples for PyTorch and TensorFlow to access a single GPU.
PyTorch
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MyModel().to(device)
TensorFlow
import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))
If you are building with libraries such as PyTorch or TensorFlow, you must ensure that you install package versions that are compatible with the CUDA or ROCm version specified by your base image.
Build with GPU-optimised images
You can directly deploy Docker images to Northflank optimised for your selected GPU, or use them as base images for your Docker builds. This helps ensure that you're using platform versions optimised for your GPU, as well as the library versions in your application.
For example you could use nvidia/cuda:12.8.0-cudnn-runtime-ubuntu22.04
to specify a CUDA version to use in your base image, or pytorch/pytorch:2.6.0-cuda11.8-cudnn9-devel
which includes the PyTorch libraries as well as the specific CUDA platform and cuDNN library.
Recommended platform versions
GPU model | Recommended versions |
---|---|
NVIDIA L4 | CUDA 12.0+ |
NVIDIA A100 | CUDA 11.0–12.4 |
NVIDIA A10G | CUDA 11.1–12.3 |
NVIDIA M60 | CUDA 7.5–10.2 |
AMD Radeon Pro V520 | OpenCL 2.2 |
NVIDIA T4 | CUDA 10.0–12.2 |
NVIDIA V100 | CUDA 9.0–12.2 |
NVIDIA K80 | CUDA 7.5–11.1 (deprecated) |
NVIDIA H100 | CUDA 12.0+ |
Habana Gaudi HL-205 | SynapseAI |
NVIDIA A10 | CUDA 11.1–12.3 |
AMD Radeon Instinct MI25 | ROCm 2.x–5.x |
NVIDIA L40S | CUDA 12.0+ |
NVIDIA H200 | CUDA 12.0+ |
NVIDIA P100 | CUDA 8.0–11.4 |
AMD MI300X / Instinct MI300X | ROCm 6.0+ |
Right-size resources
While GPU workloads offload the heavy processing to the GPU, you will need to allocate sufficient vCPU, memory, and ephemeral storage to services and jobs to handle large amounts of data or file sizes.
When dealing with large datasets or AI models you may encounter crashes due to insufficient ephemeral storage as your container tries to use it for temporary disk storage. You can increase the ephemeral storage for your services and jobs, but you should also save models, checkpoints, and data to persistent volumes to reduce ephemeral disk usage.
You can check the metrics, logs, and health for your containers to pinpoint bottlenecks and diagnose crashes.
If you're deploying on your own cloud, you can create custom plans to make best use of the high vCPU and memory of GPU nodes.
Persist models and training data
Your services and jobs are stateless, and will not persist any changes or downloads between restarts. You can add volumes to persist data, so you don’t have to repeatedly download models and datasets, and you can save model checkpoints .
You can mount persistent volumes for your applications at the default paths for model caches, for example:
Framework/image | Default model/data path | Purpose |
---|---|---|
vLLM | /root/.cache/huggingface | Hugging Face model and tokenizer cache |
Ollama | /root/.ollama | Model downloads |
Jupyter Notebook | /home/jovyan | Notebook data |
You should refer to your application or library's documentation to find the default paths, or environment variable keys to override the default directories for downloads.
Use external storage
You can also configure your application to use external storage to persist models, datasets, configuration, and other data outside your service and job containers.
This allows you to scale your instances up while persisting data, and to share data between different services and jobs.
You can deploy databases as well as MinIO, an S3-compatible datastore and implement the relevant SDKs/libraries in your application.