Deploy vLLM OpenAI on AWS on Northflank

Published 21st March 2025

Northflank

Deploy vLLM on AWS with Northflank, a high-performance serving engine for Large Language Models (LLMs).

vLLM serves models with an OpenAI-compatible API endpoint, allowing you to seamlessly integrate and interact with models using familiar OpenAI API patterns and tooling.

This template deploys a new AWS cluster into your Amazon Web Services account, and creates a vLLM service with access to a GPU and a 10GB persistent disk.

It downloads and serves the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B model by default, but you can change this to any Hugging Face model you prefer.

You may have to increase the resources available to the service, the size of the persistent disk, and update the vLLM serve command to meet the demands of other models.

Getting Started

Create an account on Northflank
Create an AWS account integration
Click deploy vLLM now
Click deploy stack to save and run the vLLM template
Select the vLLM service when the template run has finished
Open the code.run domain to check the endpoint is active

You can now interact with your model served by vLLM using OpenAI API queries.

Read our guide on deploying and using DeepSeek R1 on vLLM for more information.

Share this template with your network