Deploy Qwen3 235B Thinking 256K on 8xH200 on Northflank

Published 4th August 2025

Northflank

Qwen3-235B-A22B-Thinking-2507 is an open-source large language model (LLM) built for coding tasks. It uses a 235B parameter Mixture of Experts (MoE) architecture with 22B active parameters, matching Claude Opus 4 on benchmarks.

It shines in complex chain-of-thought reasoning, multi-step problem solving, and tool-augmented workflows, with a 262K context window that can extend to 1M for handling extensive documents and long-horizon dialogues.

Developers can rely on it for efficient, high-performance coding support.

This guide shows you how to deploy Qwen3-235B-A22B-Thinking-2507 on Northflank using a stack template, so you can start coding with it in minutes.

This stack template creates a cluster in Northflank’s cloud and sets up a vLLM service for fast, OpenAI-compatible inference of Qwen3-235B-A22B-Thinking-2507, plus an Open WebUI for easy interaction.

What this template deploys

1 cluster in Northflank’s cloud with:
- 1 node pool of 8xNVIDIA H200 GPUs for high-performance inference
A Qwen3 235b Thinking project with:
- 1 vLLM service with a mounted persistent volume for model storage
- 1 Open WebUI service with a mounted persistent volume for user interface data

How to get started

Create an account on Northflank
Click Deploy Qwen3 235B Thinking 256K on 8xH200 Now to begin deployment.
Click Deploy stack to save and run the Qwen3 235B Thinking template.
Wait for the vLLM service to load the model.
Open the code.run domain in the Open WebUI service.
Create your account and start using the model.

Share this template with your network