Deploy Qwen3 30B Coder 256K on 2xH100 on Northflank

Published 4th August 2025

Northflank

Qwen3-Coder-30B-A3B-Instruct is an open-source large language model (LLM) built for coding tasks. It uses a 30.5B parameter Mixture of Experts (MoE) architecture with 3.3B active parameters, matching Claude Sonnet 4 on coding benchmarks.

It shines in agentic coding, agentic browser use, and other foundational coding tasks, with a 256K context window that can extend to 1M for repository-scale understanding.

Developers can rely on it for efficient, high-performance coding support.

This guide shows you how to deploy Qwen3-Coder-30B-A3B-Instruct on Northflank using a stack template, so you can start coding with it in minutes.

This stack template creates a cluster in Northflank’s cloud and sets up a vLLM service for fast, OpenAI-compatible inference of Qwen3-Coder-30B-A3B-Instruct, plus an Open WebUI for easy interaction. With Northflank’s straightforward deployment and GPU-powered infrastructure, your coding assistant will be ready for complex tasks with low latency.

What this template deploys

1 cluster in Northflank’s cloud with:
- 1 node pool of 2xNVIDIA H100 GPUs for high-performance inference
A Qwen3 30B Coder project with:
- 1 vLLM service with a mounted persistent volume for model storage
- 1 Open WebUI service with a mounted persistent volume for user interface data

How to get started

Create an account on Northflank.
Click Deploy Qwen3 30B Coder 256K on 2xH100 Now to begin deployment.
Click Deploy stack to save and run the Qwen3 30B Coder template.
Wait for the vLLM service to load the model.
Open the code.run domain in the Open WebUI service.
Create your account and start using the model.

Share this template with your network