Deploy Deepseek V3-1 on Northflank

Published 22nd August 2025

This guide helps you deploy DeepSeek-V3.1 on Northflank using a ready-made template pattern similar to our Qwen stack. The service runs on vLLM, giving you high-throughput, OpenAI-compatible inference and a user-friendly Open WebUI out of the box. DeepSeek-V3.1 supports both thinking and non-thinking chat modes via its chat template, and offers a 128K context window.

What this template deploys

  • 1 cluster in Northflank’s cloud:
    • 1 node pool of 8 × NVIDIA H200 GPUs for high-throughput MoE inference
  • A DeepSeek-V3.1 project consisting of:
    • 1 vLLM service with a mounted persistent volume for caching the 671B-parameter MoE model
    • 1 Open WebUI service with a mounted persistent volume for UI data and user sessions

How to get started

  1. Create an account on Northflank.
  2. Click Deploy Deepseek v3-1 Now to begin deployment.
  3. Click Deploy stack to save and run the DeepSeek-V3.1 template.
  4. Wait for the vLLM service to load the model shards into GPU memory and begin the inference engine.
  5. Open the code.run domain in the Open WebUI service.
  6. Create your user account in the WebUI and start interacting with DeepSeek-V3.1.

Once deployed, you’ll have an OpenAI-compatible endpoint for fast, low-latency inference and a full-featured web interface for rapid prototyping, reasoning, and exploration.

Share this template with your network
X