Deploy LiteLLM on Northflank

Published 19th November 2025

LiteLLM is a lightweight, self-hosted large language model (LLM) inference platform that enables developers and teams to run language models locally with full control over data and infrastructure. It’s widely used for building AI-powered applications, integrating LLMs into software workflows, and enabling secure, real-time model inference.

With Northflank, you can deploy LiteLLM in minutes using the LiteLLM stack template. This prebuilt setup handles scaling, networking, storage, and security, so you can focus on building AI applications instead of managing servers.

What is LiteLLM?

If you’re new to the platform and wondering what LiteLLM is, it’s a self-hosted server for running large language models with real-time APIs.

You can securely store models in a PostgreSQL database, manage master and salt keys for encryption, and serve inference requests through a simple HTTP interface. Many developers host LiteLLM on Docker, sync configurations with GitHub repositories, and use it to power AI applications, internal tools, or experimental projects that run 24/7.

What this template deploys

The LiteLLM stack template provisions everything needed for a production-ready LLM inference environment.

It includes:

  • A deployment service running the official LiteLLM Docker image
  • A PostgreSQL addon for securely storing models, runtime data, and encrypted keys
  • Preconfigured environment variables for secure, production-ready operation, including master and salt keys

This setup ensures reliable inference by offloading database and storage operations to the dedicated PostgreSQL addon while keeping the LiteLLM service responsive under load.

How to get started

  1. Create an account on Northflank
  2. Click deploy LiteLLM now
  3. Click deploy stack to save and run the LiteLLM template
  4. Select the LiteLLM service once the deployment finishes
  5. Open the public URL to access the LiteLLM API and start running models

Key features

This stack template gives you a full-featured setup for LiteLLM inference:

  • Run large language models with complete control over infrastructure
  • Store models and runtime data securely in a PostgreSQL database
  • Manage master and salt keys for encrypted operations
  • Persist model and runtime files across deployments
  • Serve inference requests in real-time through HTTP endpoints
  • Scale model serving without manually managing infrastructure

It follows production best practices and can integrate seamlessly with GitHub repositories for version-controlled deployment and model updates.

How it works

  • LiteLLM Service – Hosts the LiteLLM application and API for managing inference requests
  • PostgreSQL Database – Stores models, runtime data, and encrypted keys for secure operation
  • Environment Variables – Automatically configured for production-ready operation, including master key, salt key, and database connection

Conclusion

Deploying LiteLLM on Northflank is the easiest way to run a modern, self-hosted LLM platform that is reliable, secure, and developer-friendly.

You now have a scalable, production-ready environment for serving language models, with everything you need to build AI applications without worrying about infrastructure.

Share this template with your network
X