Deploy vLLM OpenAI on AWS on Northflank

Published 21st March 2025

Deploy vLLM on AWS with Northflank, a high-performance serving engine for Large Language Models (LLMs).

vLLM serves models with an OpenAI-compatible API endpoint, allowing you to seamlessly integrate and interact with models using familiar OpenAI API patterns and tooling.

This template deploys a new AWS cluster into your Amazon Web Services account, and creates a vLLM service with access to a GPU and a 10GB persistent disk.

It downloads and serves the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B model by default, but you can change this to any Hugging Face model you prefer.

You may have to increase the resources available to the service, the size of the persistent disk, and update the vLLM serve command to meet the demands of other models.

Getting Started

  1. Create an account on Northflank
  2. Create an AWS account integration
  3. Click deploy vLLM now
  4. Click deploy stack to save and run the vLLM template
  5. Select the vLLM service when the template run has finished
  6. Open the code.run domain to check the endpoint is active

You can now interact with your model served by vLLM using OpenAI API queries.

Read our guide on deploying and using DeepSeek R1 on vLLM for more information.

Share this template with your network
X