Deploy LiteLLM on Northflank

Published 19th November 2025

LiteLLM is a lightweight, self-hosted large language model (LLM) inference platform that enables developers and teams to run language models locally with full control over data and infrastructure. It’s widely used for building AI-powered applications, integrating LLMs into software workflows, and enabling secure, real-time model inference.

With Northflank, you can deploy LiteLLM in minutes using the LiteLLM stack template. This prebuilt setup handles scaling, networking, storage, and security, so you can focus on building AI applications instead of managing servers.

What is LiteLLM?

If you’re new to the platform and wondering what LiteLLM is, it’s a self-hosted server for running large language models with real-time APIs.

You can securely store models in a PostgreSQL database, manage master and salt keys for encryption, and serve inference requests through a simple HTTP interface. Many developers host LiteLLM on Docker, sync configurations with GitHub repositories, and use it to power AI applications, internal tools, or experimental projects that run 24/7.

What this template deploys

The LiteLLM stack template provisions everything needed for a production-ready LLM inference environment.

It includes:

A deployment service running the official LiteLLM Docker image
A PostgreSQL addon for securely storing models, runtime data, and encrypted keys
Preconfigured environment variables for secure, production-ready operation, including master and salt keys

This setup ensures reliable inference by offloading database and storage operations to the dedicated PostgreSQL addon while keeping the LiteLLM service responsive under load.

How to get started

Create an account on Northflank
Click deploy LiteLLM now
Click deploy stack to save and run the LiteLLM template
Select the LiteLLM service once the deployment finishes
Open the public URL to access the LiteLLM API and start running models

Key features

This stack template gives you a full-featured setup for LiteLLM inference:

Run large language models with complete control over infrastructure
Store models and runtime data securely in a PostgreSQL database
Manage master and salt keys for encrypted operations
Persist model and runtime files across deployments
Serve inference requests in real-time through HTTP endpoints
Scale model serving without manually managing infrastructure

It follows production best practices and can integrate seamlessly with GitHub repositories for version-controlled deployment and model updates.

How it works

LiteLLM Service – Hosts the LiteLLM application and API for managing inference requests
PostgreSQL Database – Stores models, runtime data, and encrypted keys for secure operation
Environment Variables – Automatically configured for production-ready operation, including master key, salt key, and database connection

Conclusion

Deploying LiteLLM on Northflank is the easiest way to run a modern, self-hosted LLM platform that is reliable, secure, and developer-friendly.

You now have a scalable, production-ready environment for serving language models, with everything you need to build AI applications without worrying about infrastructure.

Share this template with your network