

How to install PyTorch and set it up for production
If you're new to machine learning, the first thing to know is this: PyTorch is the toolkit you’ll use to actually build and train your models, but knowing how to install PyTorch correctly is the first step to getting anything working.
It gives you the ability to do math on big chunks of data (called tensors), use your GPU to speed things up, and write neural networks in Python that can learn from images, text, or just about anything else.
PyTorch is an open-source machine learning framework built by Meta’s AI Research lab. Think of it like NumPy on steroids: it handles the math you need for deep learning, but it’s GPU-accelerated and supports automatic differentiation (which is how models learn).
Whether you're training a model to recognize cats in photos or building a recommendation engine, PyTorch is the engine under the hood.
We go in more detail in this guide.
Learning how to install PyTorch properly can save you hours of debugging later. Whether you're using a CPU-only machine or a multi-GPU server, the installation process matters.
This guide covers both basic and advanced installs, and gives you the tools to go from local development to GPU-backed production deployment with platforms like Northflank.
Let’s start simple. If you're a beginner, your best bet is to install PyTorch on your local machine and make sure it runs correctly before worrying about things like Docker or deployment.
If you don’t already have Python installed, download version 3.8 or later from python.org. Install it like any other app. Make sure to tick the checkbox that says “Add Python to PATH” during installation.
PyTorch requires Python 3.9 or later. Most developers prefer using package managers:
macOS (Homebrew):
brew install python@3.12
Ubuntu/Debian:
sudo apt update
sudo apt install python3.12 python3.12-pip
Windows (via Chocolatey):
choco install python312
Or download directly: If you prefer the official installer, download Python 3.12+ from python.org. On Windows, make sure to check "Add Python to PATH" during installation.
Verify installation:
python3 --version# Should show 3.9+
pip3 --version
Virtual environments are containers that isolate your Python projects. This avoids conflicts between different libraries. Run the following in your terminal:
python -m venv myenv
Then activate the environment:
- On macOS/Linux:
source myenv/bin/activate
- On Windows:
myenv\Scripts\activate
Once you activate it, your shell prompt should show (myenv)
.
Use PyTorch's official install selector to generate the exact command for your platform.
CPU-only version:
pip install torch torchvision torchaudio
GPU support:
For NVIDIA GPUs with CUDA 11.8:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
For Apple Silicon Macs (M1/M2/M3 with MPS acceleration):
pip install torch torchvision torchaudio
# MPS support is included by default - no special installation needed
Check your system:
- NVIDIA GPU: Run
nvidia-smi
and look at the "CUDA Version" at the top - Apple Silicon: MPS is automatically available on M1/M2/M3 Macs running macOS 12.3+
Verify GPU acceleration:
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"MPS available: {torch.backends.mps.is_available()}")
Create a Python file or open a Python shell and run:
import torch
print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
print("Device:", torch.cuda.get_device_name(0))
If everything’s working, you’ll see your GPU name. If not, your setup is likely falling back to CPU.
CUDA is NVIDIA’s toolkit that allows software like PyTorch to communicate with your GPU. If you want to accelerate model training or inference, you need CUDA. PyTorch comes with CUDA pre-packaged in its installation wheels, but only if you use the right command.
If you mismatch your CUDA version with your GPU drivers, your model will run on CPU even if you have a GPU. That’s why nvidia-smi
is your best friend, it tells you what your GPU supports. Then you match that with the PyTorch install command.
If you're using AMD hardware, things get more complicated. You’ll need the ROCm (Radeon Open Compute) version of PyTorch. Fewer prebuilt packages are available, and compatibility depends on your GPU model and OS. For most beginners: if you’re using NVIDIA, stick to CUDA.
Now that you know how to install PyTorch locally, let’s look at how to containerize it using Docker so it’s portable and production-ready.
Docker lets you build your PyTorch project once and run it anywhere, with all dependencies pre-installed. This is critical for production or team projects.
Start from a base image that includes CUDA, cuDNN, and PyTorch:
FROM pytorch/pytorch:2.6.0-cuda11.8-cudnn9-devel
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["python", "inference.py"]
This image includes:
- PyTorch 2.6
- CUDA 11.8
- cuDNN 9
These versions must match the capabilities of your target GPU. For example, if you’re deploying on an NVIDIA A100 or H100, CUDA 11.8+ is required.
You can build and test the image locally with:
docker build -t my-pytorch-app .
docker run --gpus all my-pytorch-app
If your model works locally in Docker, deploying to Northflank is straightforward. It abstracts all the GPU provisioning, networking, and monitoring.
First, push your code to GitHub. Then:
- Log into your Northflank account.
- Create a new service.
- Connect your GitHub repo.
- Select the Dockerfile path.
- Enable GPU and pick your target GPU (e.g. H100, A100).
- Set environment variables and any required secrets (like Hugging Face tokens).
- Configure autoscaling (min/max instances, memory, CPU).
- Deploy.
Once deployed, you get:
- Live logs and metrics
- GPU and memory usage graphs
- Rollbacks on failed deploys
- Volume mounting (e.g. to cache model files)
Need to serve models behind an API? Wrap your PyTorch inference in FastAPI or Flask, and Northflank can expose it over HTTPS instantly.
PyTorch not seeing your GPU? First, run this inside your environment:
import torch
print(torch.cuda.is_available())
print(torch.version.cuda)
If cuda.is_available()
is False, it’s likely an issue with how you installed PyTorch, wrong CUDA version, missing GPU drivers, or a CPU-only wheel.
If your container deploys but silently uses CPU:
- Your base image may be CPU-only.
- You may not have enabled GPU on Northflank.
- You forgot to run
.to(device)
in your model code.
If your model crashes during inference with memory errors:
- Increase memory or ephemeral storage on Northflank.
- Mount persistent volumes for models and temp data.
- Split large batches into smaller chunks.
If you're not sure what's going wrong, start small. Run a minimal model, test GPU access explicitly, and incrementally build up.
A lot of people think “how to install PyTorch” ends at pip install
. but if you’re running on GPUs or deploying to production, it’s only the beginning. You need to:
- Match CUDA versions across drivers, wheels, and containers
- Use the right base images
- Validate GPU access
- Prepare for production (persistent storage, scaling, secrets)
Northflank removes a huge amount of overhead. No YAML, no provisioning scripts, no K8s ops. You bring the model, Northflank handles the infra.