Deploy Qwen3 VL Instruct 235B on Northflank

Published 13th November 2025

Northflank

This guide helps you deploy Qwen3 VL Instruct 235B on Northflank using a ready-made template. The inference is done using vLLM, so you can start leveraging its high-performance and scalabe inference with a user-friendly Open WebUI with ease.

What this template deploys

A Qwen3 VL Instruct project consisting of:
- 1 vLLM Qwen3 VL Instruct service with 8 × NVIDIA H200 GPUs for high-throughput inference
- 1 Mounted persisten volume for storing the model, enabling fast service restarts
- 1 Open WebUI service with a mounted persistent volume for UI data and user sessions

How to get started

Create an account on Northflank.
Click Deploy Qwen3 VL Instruct 235B Now to begin deployment.
Click Deploy stack to save and run the Qwen3 VL Instruct 235B template.
Wait for the vLLM service to load the model into GPU memory and start the inference engine. This can be confirmed by viewing the service logs, and will take about 10 minutes on the first run.
Open the code.run domain in the Open WebUI service.
Create your admin account in the WebUI and start interacting with Qwen3 VL Instruct 235B.

Once deployed, you’ll have an OpenAI-compatible endpoint for fast, low-latency inference and a full-featured web interface for rapid responses, coding assistance, and exploration.

Share this template with your network