Deploy Qwen3 30B Thinking 256K on 2xH100 on Northflank

Published 4th August 2025

Northflank

Qwen3-30B-A3B-Thinking-2507 is an open-source large language model (LLM) built for coding tasks. It uses a 30B parameter Mixture of Experts (MoE) architecture with 3B active parameters, matching Claude Sonnet 4 on benchmarks.

It shines in complex chain-of-thought reasoning, multi-step problem solving, and tool-augmented workflows, with a 256K context window that can extend to 1M for handling extensive documents and long-horizon dialogues.

Developers can rely on it for efficient, high-performance coding support.

This guide shows you how to deploy Qwen3-30B-A3B-Thinking-2507 on Northflank using a stack template, so you can start coding with it in minutes.

This stack template creates a cluster in Northflank’s cloud and sets up a vLLM service for fast, OpenAI-compatible inference of Qwen3-30B-A3B-Thinking-2507, plus an Open WebUI for easy interaction.

What this template deploys

1 cluster in Northflank’s cloud with:
- 1 node pool of 2xNVIDIA H100 GPUs for high-performance inference
A Qwen3 30B Thinking project with:
- 1 vLLM service with a mounted persistent volume for model storage
- 1 Open WebUI service with a mounted persistent volume for user interface data

How to get started

Create an account on Northflank.
Click Deploy Qwen3 30B Thinking 256K on 2xH100 Now to begin deployment.
Click Deploy stack to save and run the Qwen3 30B Thinking template.
Wait for the vLLM service to load the model.
Open the code.run domain in the Open WebUI service.
Create your account and start using the model.

Share this template with your network