Deploy Qwen3 30B Thinking 256K on 2xH100 on Northflank

Qwen3-30B-A3B-Thinking-2507 is an open-source large language model (LLM) built for coding tasks. It uses a 30B parameter Mixture of Experts (MoE) architecture with 3B active parameters, matching Claude Sonnet 4 on benchmarks.
It shines in complex chain-of-thought reasoning, multi-step problem solving, and tool-augmented workflows, with a 256K context window that can extend to 1M for handling extensive documents and long-horizon dialogues.
Developers can rely on it for efficient, high-performance coding support.
This guide shows you how to deploy Qwen3-30B-A3B-Thinking-2507 on Northflank using a stack template, so you can start coding with it in minutes.
This stack template creates a cluster in Northflank’s cloud and sets up a vLLM service for fast, OpenAI-compatible inference of Qwen3-30B-A3B-Thinking-2507, plus an Open WebUI for easy interaction.
- 1 cluster in Northflank’s cloud with:
- 1 node pool of 2xNVIDIA H100 GPUs for high-performance inference
- A Qwen3 30B Thinking project with:
- 1 vLLM service with a mounted persistent volume for model storage
- 1 Open WebUI service with a mounted persistent volume for user interface data
- Create an account on Northflank.
- Click
Deploy Qwen3 30B Thinking 256K on 2xH100 Now
to begin deployment. - Click
Deploy stack
to save and run the Qwen3 30B Thinking template. - Wait for the vLLM service to load the model.
- Open the code.run domain in the Open WebUI service.
- Create your account and start using the model.