Header image for blog post: How to install PyTorch and set it up for production

Published 8th June 2025

How to install PyTorch and set it up for production

If you're new to machine learning, the first thing to know is this: PyTorch is the toolkit you’ll use to actually build and train your models, but knowing how to install PyTorch correctly is the first step to getting anything working.

It gives you the ability to do math on big chunks of data (called tensors), use your GPU to speed things up, and write neural networks in Python that can learn from images, text, or just about anything else.

What is PyTorch, really?

PyTorch is an open-source machine learning framework built by Meta’s AI Research lab. Think of it like NumPy on steroids: it handles the math you need for deep learning, but it’s GPU-accelerated and supports automatic differentiation (which is how models learn).

Whether you're training a model to recognize cats in photos or building a recommendation engine, PyTorch is the engine under the hood.

We go in more detail in this guide.

Learning how to install PyTorch properly can save you hours of debugging later. Whether you're using a CPU-only machine or a multi-GPU server, the installation process matters.

This guide covers both basic and advanced installs, and gives you the tools to go from local development to GPU-backed production deployment with platforms like Northflank.

How to install PyTorch

1. How to Install PyTorch locally (step-by-step for beginners)

Let’s start simple. If you're a beginner, your best bet is to install PyTorch on your local machine and make sure it runs correctly before worrying about things like Docker or deployment.

Step 1: Install Python

If you don’t already have Python installed, download version 3.8 or later from python.org. Install it like any other app. Make sure to tick the checkbox that says “Add Python to PATH” during installation.

PyTorch requires Python 3.9 or later. Most developers prefer using package managers:

macOS (Homebrew):

brew install python@3.12

Ubuntu/Debian:

sudo apt update
sudo apt install python3.12 python3.12-pip

Windows (via Chocolatey):

choco install python312

Or download directly: If you prefer the official installer, download Python 3.12+ from python.org. On Windows, make sure to check "Add Python to PATH" during installation.

Verify installation:

python3 --version# Should show 3.9+
pip3 --version

Step 2: Create a virtual environment

Virtual environments are containers that isolate your Python projects. This avoids conflicts between different libraries. Run the following in your terminal:

python -m venv myenv

Then activate the environment:

On macOS/Linux: source myenv/bin/activate
On Windows: myenv\Scripts\activate

Once you activate it, your shell prompt should show (myenv).

Step 3: Install PyTorch

Use PyTorch's official install selector to generate the exact command for your platform.

CPU-only version:

pip install torch torchvision torchaudio

GPU support:

For NVIDIA GPUs with CUDA 11.8:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

For Apple Silicon Macs (M1/M2/M3 with MPS acceleration):

pip install torch torchvision torchaudio
# MPS support is included by default - no special installation needed

Check your system:

NVIDIA GPU: Run nvidia-smi and look at the "CUDA Version" at the top
Apple Silicon: MPS is automatically available on M1/M2/M3 Macs running macOS 12.3+

Verify GPU acceleration:

import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"MPS available: {torch.backends.mps.is_available()}")

Step 4: Test your install

Create a Python file or open a Python shell and run:

import torch
print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("Device:", torch.cuda.get_device_name(0))

If everything’s working, you’ll see your GPU name. If not, your setup is likely falling back to CPU.

2. What is CUDA (and why it matters)?

CUDA is NVIDIA’s toolkit that allows software like PyTorch to communicate with your GPU. If you want to accelerate model training or inference, you need CUDA. PyTorch comes with CUDA pre-packaged in its installation wheels, but only if you use the right command.

If you mismatch your CUDA version with your GPU drivers, your model will run on CPU even if you have a GPU. That’s why nvidia-smi is your best friend, it tells you what your GPU supports. Then you match that with the PyTorch install command.

If you're using AMD hardware, things get more complicated. You’ll need the ROCm (Radeon Open Compute) version of PyTorch. Fewer prebuilt packages are available, and compatibility depends on your GPU model and OS. For most beginners: if you’re using NVIDIA, stick to CUDA.

Now that you know how to install PyTorch locally, let’s look at how to containerize it using Docker so it’s portable and production-ready.

3. Docker and PyTorch for production

Docker lets you build your PyTorch project once and run it anywhere, with all dependencies pre-installed. This is critical for production or team projects.

Start from a base image that includes CUDA, cuDNN, and PyTorch:

FROM pytorch/pytorch:2.6.0-cuda11.8-cudnn9-devel
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["python", "inference.py"]

This image includes:

PyTorch 2.6
CUDA 11.8
cuDNN 9

These versions must match the capabilities of your target GPU. For example, if you’re deploying on an NVIDIA A100 or H100, CUDA 11.8+ is required.

You can build and test the image locally with:

docker build -t my-pytorch-app .
docker run --gpus all my-pytorch-app

4. Deploying to Northflank (step-by-step)

If your model works locally in Docker, deploying to Northflank is straightforward. It abstracts all the GPU provisioning, networking, and monitoring.

First, push your code to GitHub. Then:

Log into your Northflank account.
Create a new service.
Connect your GitHub repo.
Select the Dockerfile path.
Enable GPU and pick your target GPU (e.g. H100, A100).
Set environment variables and any required secrets (like Hugging Face tokens).
Configure autoscaling (min/max instances, memory, CPU).
Deploy.

Once deployed, you get:

Live logs and metrics
GPU and memory usage graphs
Rollbacks on failed deploys
Volume mounting (e.g. to cache model files)

Need to serve models behind an API? Wrap your PyTorch inference in FastAPI or Flask, and Northflank can expose it over HTTPS instantly.

Learn more about GPU workloads on Northflank

When things go wrong: debugging common issues

PyTorch not seeing your GPU? First, run this inside your environment:

import torch
print(torch.cuda.is_available())
print(torch.version.cuda)

If cuda.is_available() is False, it’s likely an issue with how you installed PyTorch, wrong CUDA version, missing GPU drivers, or a CPU-only wheel.

If your container deploys but silently uses CPU:

Your base image may be CPU-only.
You may not have enabled GPU on Northflank.
You forgot to run .to(device) in your model code.

If your model crashes during inference with memory errors:

Increase memory or ephemeral storage on Northflank.
Mount persistent volumes for models and temp data.
Split large batches into smaller chunks.

If you're not sure what's going wrong, start small. Run a minimal model, test GPU access explicitly, and incrementally build up.

Final thoughts

A lot of people think “how to install PyTorch” ends at pip install. but if you’re running on GPUs or deploying to production, it’s only the beginning. You need to:

Match CUDA versions across drivers, wheels, and containers
Use the right base images
Validate GPU access
Prepare for production (persistent storage, scaling, secrets)

Northflank removes a huge amount of overhead. No YAML, no provisioning scripts, no K8s ops. You bring the model, Northflank handles the infra.

Get started with Northflank →

Share this article with your network

Will Stewart • 16th July 2025

Open source LLMs: The complete developer's guide to choosing and deploying LLMs

This guide shows you exactly how to select, deploy, and scale open source LLMs for production use.

Will Stewart • 15th July 2025

Top 5 Fal.ai alternatives for inference and AI infrastructure

Let’s be clear upfront: Fal.ai is excellent at what it’s built for, but in case you might be looking for an alternative, we've got you covered.

Also from the blog