Deploy Deploy Qwen3 4B Instruct-2507 on Northflank

Published 6th August 2025

This guide helps you deploy Qwen3 4B Instruct-2507 on Northflank using a ready-made template. The inference is done using vLLM, so you can start leveraging its high-performance and scalabe inference with a user-friendly Open WebUI with ease.

What this template deploys

  • A Qwen3 4B Instruct-2507 project consisting of:
    • 1 vLLM Qwen3 4B Instruct-2507 service with 1 × NVIDIA A100 40GB GPU for high-throughput inference
    • 1 Mounted persisten volume for storing the model, enabling fast service restarts
    • 1 Open WebUI service with a mounted persistent volume for UI data and user sessions

How to get started

  1. Create an account on Northflank.
  2. Click Deploy Qwen3 4B Instruct-2507 Now to begin deployment.
  3. Click Deploy stack to save and run the Qwen3 4B Instruct-2507 template.
  4. Wait for the vLLM service to load the model into GPU memory and start the inference engine. This can be confirmed by viewing the service logs, and will take about 10 minutes on the first run.
  5. Open the code.run domain in the Open WebUI service.
  6. Create your admin account in the WebUI and start interacting with Qwen3 4B Instruct-2507.

Once deployed, you’ll have an OpenAI-compatible endpoint for fast, low-latency inference and a full-featured web interface for rapid responses, coding assistance, and exploration.

Share this template with your network
X