Deploy Qwen3 30B Instruct 256K on 2xH100 on Northflank

Qwen3-30B-A3B-Instruct-2507 is an open-source large language model (LLM) built for coding tasks. It uses a 30B parameter Mixture of Experts (MoE) architecture with 3B active parameters, matching Claude Sonnet 4 on benchmarks.
It shines in few-shot learning, instruction-following, and multi-turn dialogue, with a 256K context window that can extend to 1M for processing lengthy instructions and documents.
Developers can rely on it for efficient, high-performance coding support.
This guide shows you how to deploy Qwen3-30B-A3B-Instruct-2507 on Northflank using a stack template, so you can start coding with it in minutes.
This stack template creates a cluster in Northflank’s cloud and sets up a vLLM service for fast, OpenAI-compatible inference of Qwen3-30B-A3B-Instruct-2507, plus an Open WebUI for easy interaction. With Northflank’s straightforward deployment and GPU-powered infrastructure, your coding assistant will be ready for complex tasks with low latency.
- 1 cluster in Northflank’s cloud with:
- 1 node pool of 2xNVIDIA H100 GPUs for high-performance inference
- A Qwen3 30B Instruct project with:
- 1 vLLM service with a mounted persistent volume for model storage
- 1 Open WebUI service with a mounted persistent volume for user interface data
- Create an account on Northflank.
- Click
Deploy Qwen3 30B Instruct 256K on 2xH100 Now
to begin deployment. - Click
Deploy stack
to save and run the Qwen3 30B Instruct template. - Wait for the vLLM service to load the model.
- Open the code.run domain in the Open WebUI service.
- Create your account and start using the model.