Deploy vLLM OpenAI on AWS on Northflank

Deploy vLLM on AWS with Northflank, a high-performance serving engine for Large Language Models (LLMs).
vLLM serves models with an OpenAI-compatible API endpoint, allowing you to seamlessly integrate and interact with models using familiar OpenAI API patterns and tooling.
This template deploys a new AWS cluster into your Amazon Web Services account, and creates a vLLM service with access to a GPU and a 10GB persistent disk.
It downloads and serves the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
model by default, but you can change this to any Hugging Face model you prefer.
You may have to increase the resources available to the service, the size of the persistent disk, and update the vLLM serve command to meet the demands of other models.
- Create an account on Northflank
- Create an AWS account integration
- Click deploy vLLM now
- Click
deploy stack
to save and run the vLLM template - Select the vLLM service when the template run has finished
- Open the
code.run
domain to check the endpoint is active
You can now interact with your model served by vLLM using OpenAI API queries.
Read our guide on deploying and using DeepSeek R1 on vLLM for more information.