Deploy MiniMax M2 on Northflank

Published 6th November 2025

This guide helps you deploy MiniMax M2 with a 128k context window on Northflank using a ready-made template. The inference is done using vLLM, so you can start leveraging its high-performance and scalabe inference with a user-friendly Open WebUI with ease.

What this template deploys

A MiniMax M2 project consisting of:
- 1 vLLM MiniMax M2 service with 2 × NVIDIA h100 GPUs for high-throughput inference
- 1 Mounted persisten volume for storing the model, enabling fast service restarts
- 1 Open WebUI service with a mounted persistent volume for UI data and user sessions

How to get started

Create an account on Northflank.
Click Deploy MiniMax M2 Now to begin deployment.
Click Deploy stack to save and run the MiniMax M2 template.
Wait for the vLLM service to load the model into GPU memory and start the inference engine. This can be confirmed by viewing the service logs, and will take about 20 minutes on the first run.
Open the code.run domain in the Open WebUI service.
Create your admin account in the WebUI and start interacting with MiniMax M2.

Once deployed, you’ll have an OpenAI-compatible endpoint for fast, low-latency inference and a full-featured web interface for rapid responses, coding assistance, and exploration.

Share this template with your network