Deploy GPT OSS 120B on Northflank

Published 5th August 2025

This guide helps you deploy GPT-OSS 120B on Northflank using a ready-made template. The inference is done using vLLM, so you can start leveraging its high-performance and scalabe inference with a user-friendly Open WebUI with ease.

What this template deploys

A GPT-OSS 120B project consisting of:
- 1 vLLM GPT-OSS service with 2 × NVIDIA H100 GPUs for high-throughput inference
- 1 Mounted persisten volume for storing the 117 B–parameter model, enabling fast service restarts
- 1 Open WebUI service with a mounted persistent volume for UI data and user sessions

How to get started

Create an account on Northflank.
Click Deploy GPT OSS 120B Now to begin deployment.
Click Deploy stack to save and run the gpt-oss-120B template.
Wait for the vLLM service to load the model into GPU memory and start the inference engine. This can be confirmed by viewing the service logs, and can take about 30 minutes on the first run.
Open the code.run domain in the Open WebUI service.
Create your admin account in the WebUI and start interacting with GPT-OSS 120b.

Once deployed, you’ll have an OpenAI-compatible endpoint for fast, low-latency inference and a full-featured web interface for rapid responses, coding assistance, and exploration.

Share this template with your network