← Back to Guides
Header image for blog post: How to deploy and use Ollama: step-by-step deployment guide
Daniel Adeboye
Published 12th December 2025

How to deploy and use Ollama: step-by-step deployment guide

If you need a simple and reliable way to run open-source language models on your own infrastructure, Ollama is one of the fastest options available. It lets you run models like Llama 3, Mistral, and Qwen through a lightweight API, making it easy to integrate private and high-performance AI into your applications.

With Northflank, you can deploy Ollama in minutes using a one-click stack template or configure everything manually. Northflank manages GPU provisioning, networking, storage, and scaling so you can focus on building features instead of maintaining servers.

What is Ollama?

Ollama is a self-hosted model runtime that allows you to run large language models locally or in the cloud. It exposes a simple HTTP API for text generation, embeddings, and model management, and supports GGUF and other popular model formats. Developers commonly use Ollama to power chatbots, RAG pipelines, automations, internal agents, or any workload that requires full control over performance and data.

Prerequisite

Before getting started, create a Northflank account.

What this guide covers

  • Deploying Ollama with a one-click template on Northflank
  • How to use your deployed Ollama service
  • Deploying Ollama manually on Northflank

What is Northflank?

Northflank is a platform that allows developers to build, deploy, and scale applications, services, databases, jobs, and GPU workloads on any cloud through a self-service approach. For DevOps and platform teams, Northflank provides a powerful abstraction layer over Kubernetes, enabling templated, standardized production releases with intelligent defaults while maintaining the configurability you need.

Option 1: Deploy Ollama with a one-click template

The easiest way to run Ollama on Northflank is through the Ollama one-click template. It provisions the project, GPU service, and persistent storage automatically. This is ideal if you want to run Ollama quickly without manually setting up everything.

image-43.png

Template overview

The Ollama stack template includes:

  • A GPU-backed deployment service using the official ollama/ollama:latest image
  • An attached persistent volume mounted at /root/.ollama to store downloaded models
  • Automatic configuration of networking and a public HTTP endpoint

This gives you a ready-to-use Ollama environment optimized for model hosting.

Getting started

  1. Visit the Ollama template on Northflank
  2. Click Deploy Ollama now
  3. Click Deploy stack to create the project, GPU service, and attached volume
  4. Wait for deployment to complete
  5. Open the service page and copy the generated public URL

Your service will receive a public endpoint similar to: https://p01--ollama-service--<id>.code.run

How to use Ollama

Once the GPU instance is live, you can pull a model:

curl https://YOUR_PUBLIC_URL/api/pull -d '{
  "name": "qwen2.5"
}'

Generate text:

curl https://YOUR_PUBLIC_URL/api/generate -d '{
  "model": "qwen2.5",
  "prompt": "Why do teams use Northflank?",
  "stream": false
}'

Models you download are stored on the attached volume and persist across deploys and restarts.

Option 2: Deploy Ollama manually on Northflank

If you prefer configuring everything yourself or want to customise compute plans, networking, or storage, you can deploy Ollama manually.

You can also modify the one-click template if you want to extend or customise the default deployment.

Step 1: Create a Northflank project

Log in to your Northflank dashboard, then click the “Create new” button (+ icon) in the top-right corner. Then, select “Project” from the dropdown.

image - 2025-11-24T110757.198.png

Projects serve as workspaces that group together related services, making it easier to manage multiple workloads and their associated resources.

Step 2: Configure your project

You’ll need to fill out a few details before moving forward.

  • Enter a project name, such as ollama-project and optionally pick a colour for quick identification in your dashboard.

  • Select Northflank Cloud as the deployment target. This uses Northflank’s fully managed infrastructure, so you do not need to worry about Kubernetes setup or scaling.

    (Optional) If you prefer to run on your own infrastructure, you can select Bring Your Own Cloud and connect AWS, GCP, Azure, or on-prem resources.

  • Choose a region with GPU enabled that is closest to your users to minimise latency.

  • Click Create project to finalise the setup.

image-44.png

Step 3: Create a deployment service

Within your project, navigate to the Services tab in the top menu and click ’Create New Service’. Select Deployment and give your service a name such as ollama.

For the deployment source, choose External image and enter the official Meilisearch Docker image: ollama/ollama:latest.

image-48.png

Select compute resources

Choose the compute size that suits your workload:

  • Enable GPU
  • Under GPU model, select A100 80GB with 1 GPU instance. This is sufficient to run Ollama comfortably.

You can always adjust your resources later, so you can start small and scale up as needed.

image-45.png

Set up a port so your app is accessible:

  • Port: 11434
  • Protocol: HTTP
  • Public access: enable this to access your Ollama instance

Northflank will automatically generate a secure, unique public URL for your service. This saves you from having to manage DNS or SSL certificates manually.

image-46.png

Click Create service to deploy Ollama. Once successful, you’ll see the public URL in the top right corner. e.g.: p01--ollama--lppg6t2b6kzf.code.run

Step 4: Create a persistent volume

Inside your project, go to the Volumes tab and click Create new volume.

  • Name it ollama-volume
  • Choose an access mode and storage type (Multi Read/Write is recommended)
  • Choose a storage size (start small for testing, scale up for production)
  • Set the volume mount path to /root/.ollama
  • Attach the volume to your ollama service to enable persistent storage
  • Click Create volume to finalize.

image-47.png

After successfully creating your volume, you need to restart your service. Once completed, you can access your deployed Ollama service.

Conclusion

Deploying Ollama on Northflank gives you a fast, dependable way to run open-source language models with full control over performance and privacy. You can use the one-click template for an instant GPU-ready environment or configure everything manually for full customisation. With GPU acceleration, persistent storage, and an accessible API endpoint, you have everything you need to build private AI systems, prototypes, or production-grade workloads at scale.

Share this article with your network
X