Deploy GPT OSS 120B on Northflank
Published 5th August 2025

By
Northflank
This guide helps you deploy GPT-OSS 120B on Northflank using a ready-made template. The inference is done using vLLM, so you can start leveraging its high-performance and scalabe inference with a user-friendly Open WebUI with ease.
- A GPT-OSS 120B project consisting of:
- 1 vLLM GPT-OSS service with 2 × NVIDIA H100 GPUs for high-throughput inference
- 1 Mounted persisten volume for storing the 117 B–parameter model, enabling fast service restarts
- 1 node pool of 2 × NVIDIA H100 GPUs for high-throughput inference
- 1 Open WebUI service with a mounted persistent volume for UI data and user sessions
- Create an account on Northflank.
- Click
Deploy GPT OSS 120B Now
to begin deployment. - Click
Deploy stack
to save and run the gpt-oss-120B template. - Wait for the vLLM service to load the model into GPU memory and start the inference engine. This can be confirmed by viewing the service logs.
- Open the code.run domain in the Open WebUI service.
- Create your admin account in the WebUI and start interacting with GPT-OSS 120b.
Once deployed, you’ll have an OpenAI-compatible endpoint for fast, low-latency inference and a full-featured web interface for rapid responses, coding assistance, and exploration.