Deploy Deepseek v3.1 on Northflank
Published 22nd August 2025

By 
Northflank
This guide helps you deploy DeepSeek-V3.1 on Northflank using a ready-made template pattern similar to our Qwen stack. The service runs on vLLM, giving you high-throughput, OpenAI-compatible inference and a user-friendly Open WebUI out of the box. DeepSeek-V3.1 supports both thinking and non-thinking chat modes via its chat template, and offers a 128K context window.
- 1 cluster in Northflank’s cloud:
- 1 node pool of 8 × NVIDIA H200 GPUs for high-throughput MoE inference
 
- A DeepSeek-V3.1 project consisting of:
- 1 vLLM service with a mounted persistent volume for caching the 671B-parameter MoE model
- 1 Open WebUI service with a mounted persistent volume for UI data and user sessions
 
- Create an account on Northflank.
- Click Deploy Deepseek v3.1 Nowto begin deployment.
- Click Deploy stackto save and run the DeepSeek-V3.1 template.
- Wait for the vLLM service to load the model shards into GPU memory and begin the inference engine.
- Open the code.run domain in the Open WebUI service.
- Create your user account in the WebUI and start interacting with DeepSeek-V3.1.
Once deployed, you’ll have an OpenAI-compatible endpoint for fast, low-latency inference and a full-featured web interface for rapid prototyping, reasoning, and exploration.