Deploy Qwen3 30B Coder 256K on 2xH100 on Northflank

Qwen3-Coder-30B-A3B-Instruct is an open-source large language model (LLM) built for coding tasks. It uses a 30.5B parameter Mixture of Experts (MoE) architecture with 3.3B active parameters, matching Claude Sonnet 4 on coding benchmarks.
It shines in agentic coding, agentic browser use, and other foundational coding tasks, with a 256K context window that can extend to 1M for repository-scale understanding.
Developers can rely on it for efficient, high-performance coding support.
This guide shows you how to deploy Qwen3-Coder-30B-A3B-Instruct on Northflank using a stack template, so you can start coding with it in minutes.
This stack template creates a cluster in Northflank’s cloud and sets up a vLLM service for fast, OpenAI-compatible inference of Qwen3-Coder-30B-A3B-Instruct, plus an Open WebUI for easy interaction. With Northflank’s straightforward deployment and GPU-powered infrastructure, your coding assistant will be ready for complex tasks with low latency.
- 1 cluster in Northflank’s cloud with:
- 1 node pool of 2xNVIDIA H100 GPUs for high-performance inference
- A Qwen3 30B Coder project with:
- 1 vLLM service with a mounted persistent volume for model storage
- 1 Open WebUI service with a mounted persistent volume for user interface data
- Create an account on Northflank.
- Click
Deploy Qwen3 30B Coder 256K on 2xH100 Now
to begin deployment. - Click
Deploy stack
to save and run the Qwen3 30B Coder template. - Wait for the vLLM service to load the model.
- Open the code.run domain in the Open WebUI service.
- Create your account and start using the model.