Deploy Qwen3 30B Instruct 256K on 2xH100 on Northflank

Published 4th August 2025

Qwen3-30B-A3B-Instruct-2507 is an open-source large language model (LLM) built for coding tasks. It uses a 30B parameter Mixture of Experts (MoE) architecture with 3B active parameters, matching Claude Sonnet 4 on benchmarks.

It shines in few-shot learning, instruction-following, and multi-turn dialogue, with a 256K context window that can extend to 1M for processing lengthy instructions and documents.

Developers can rely on it for efficient, high-performance coding support.

This guide shows you how to deploy Qwen3-30B-A3B-Instruct-2507 on Northflank using a stack template, so you can start coding with it in minutes.

This stack template creates a cluster in Northflank’s cloud and sets up a vLLM service for fast, OpenAI-compatible inference of Qwen3-30B-A3B-Instruct-2507, plus an Open WebUI for easy interaction. With Northflank’s straightforward deployment and GPU-powered infrastructure, your coding assistant will be ready for complex tasks with low latency.

What this template deploys

  • 1 cluster in Northflank’s cloud with:
    • 1 node pool of 2xNVIDIA H100 GPUs for high-performance inference
  • A Qwen3 30B Instruct project with:
    • 1 vLLM service with a mounted persistent volume for model storage
    • 1 Open WebUI service with a mounted persistent volume for user interface data

How to get started

  1. Create an account on Northflank.
  2. Click Deploy Qwen3 30B Instruct 256K on 2xH100 Now to begin deployment.
  3. Click Deploy stack to save and run the Qwen3 30B Instruct template.
  4. Wait for the vLLM service to load the model.
  5. Open the code.run domain in the Open WebUI service.
  6. Create your account and start using the model.
Share this template with your network
X