← Back to Blog
Header image for blog post: Best GPUs for AI workloads (and how to run them on Northflank)
Deborah Emeni
Published 25th July 2025

Best GPUs for AI workloads (and how to run them on Northflank)

If you've worked with models like Qwen, DeepSeek, or LLaMA, you already know different workloads push your GPU in different ways. Some need high memory to even start, others only need something that won't slow down during inference.

That’s why I started looking into which GPUs people rely on for AI workloads, and how you can run them without spending thousands on hardware upfront.

Northflank makes that possible. You can run high-end GPUs like the H100, A100, or 4090, use your own cloud setup if you have one, and get all the tools to train, serve, and deploy your models in one place.

In this article, I’ll help you figure out which GPUs are best for different AI use cases, and how to start using them without owning the hardware yourself.

TL;DR: Best GPUs by use case + how to run them without owning hardware

If you're working on training, fine-tuning, or running inference on large models, most people have seen that VRAM tends to be more important than clock speed. The more memory you have, the better your models can run without running into limits.

What most people recommend:

  • 24GB+ VRAM is the recommended range for modern AI workloads.
  • Common choices for local use:
    • RTX 3090 – reliable used option, affordable for 24GB
    • RTX 4090 / 5090 – much faster, newer architecture
  • Anything less than 24GB quickly runs into memory limits with LLMs, image generation, or batch inference.

What if you don’t want to buy hardware?

That’s where Northflank comes in. You can:

  1. Run A100, H100, and 4090 workloads (with no physical GPU needed)
  2. Pay by the hour (starting at $2.74/hour for an H100, with spot pricing available)
  3. Train, serve, and deploy models (beyond compute alone, it's a full-stack platform.)
  4. Use your own cloud (BYOC) (if you already have GPU infrastructure)

With Northflank, you get the performance of high-end GPUs without the upfront cost, cooling setup, or constant maintenance. It's designed for teams and solo developers who prioritize speed, pricing, and flexibility.

Which GPU is best for your AI workload?

Not every GPU is built for the same kind of work. Some are perfect for fast, lightweight inference. Others are built to handle massive fine-tuning runs or full model training.

I'll give you a quick breakdown to help you match the right GPU to the job.

I've grouped them by workload type, so you can find what fits best based on what you're building or running:

1. Running inference (lightweight or low-latency)

If you're serving models in production or building an API that responds in real time, you’ll want something that balances speed, energy use, and cost.

  • Recommended GPUs: NVIDIA L4, A10G
  • Best for: Qwen 1.5 7B, DeepSeek Coder 1.3B, Whisper, image APIs
  • Why it works: Handles both batch and real-time inference smoothly

2. Fine-tuning smaller models (LoRA, adapters, PEFT)

When you’re customizing open-source models or experimenting with parameter-efficient tuning, memory becomes a bigger factor, particularly for attention-heavy models.

  • Recommended GPUs: A100 (40GB/80GB), H100
  • Best for: LLaMA 2/3, Mistral-7B, custom SD models
  • Why it works: Supports DeepSpeed, FlashAttention, and LoRA out of the box

3. Full model training (large-scale)

Training full models from scratch or running large distributed training requires substantial compute and VRAM. This is where multi-node clusters come in.

  • Recommended GPUs: Multi-node A100 or H100
  • Best for: Mixtral, DeepSeek 67B, full transformer pretraining
  • How to run: Use your own infrastructure (Bring Your Own Cloud) or launch dedicated clusters on Northflank

4. AI art and generative workloads

If you're working on image generation, video synthesis, or style transfer, you'll want GPUs that combine strong VRAM with good I/O throughput.

  • Recommended GPUs: A100, L40S, RTX 4090
  • Best for: Stable Diffusion XL, StyleGAN, text-to-video workflows
  • Why it works: High VRAM and disk I/O help with large assets and fast sampling

Match open-source models to GPU needs (cheat sheet)

By now, you’ve seen that different workloads need different kinds of GPUs. However, what about specific models? If you’re working with popular OSS projects like Qwen, DeepSeek, or Stable Diffusion, let’s see a quick cheat sheet to help you choose the right setup.

ModelTypeRecommended GPUUse case on Northflank
Qwen 1.5 7BLLML4 / A10GInference or LoRA
DeepSeek Coder 6.7BCode modelA100Fine-tuning
LLaMA 3 8BLLMA100 / H100Inference or full training
Mixtral 8x7BMixture-of-ExpertsMulti-node A100 / H100Large-scale training
Stable Diffusion XLImage generationA100 / L40SInference or fine-tuning
WhisperAudio modelL4Streaming or batch inference

Why Northflank is built for GPU workloads beyond basic access

Once you’ve matched your model to the right GPU, the next step is getting access and running your workloads without extra setup. That’s where Northflank stands out, it goes beyond access to GPUs, offering a full environment designed around AI workloads.

You can:

  • Run the same project with or without a GPU. If you're deploying a CPU-based API or a GPU-heavy training job, you use the same setup, same platform.
  • Access L4, A10G, A100, and H100 directly, and switch between them as your workloads grow.
  • Bring your own infrastructure from providers like CoreWeave, Lambda Labs, or your on-prem hardware.
  • Tap into spot GPUs with automatic fallback, so your jobs don’t fail when spot capacity runs out.
  • Provision in under 30 minutes, regardless of if it's a single-node API or a multi-node distributed job.
  • Scale up and down automatically, with cost tracking and resource isolation already built in.
  • Use ready-to-go templates for Jupyter, Qwen, LLaMA, and others, including GitOps support.

In a nutshell, Northflank goes beyond providing compute, it gives you a full environment to build, train, fine-tune, and serve models without switching tools.

Platform comparison: Northflank vs other AI infrastructure tools

While Northflank gives you full-stack GPU environments, it’s also useful to compare it with other AI infrastructure platforms.

Most tools focus on giving you GPU access alone, but if you also care about things like autoscaling APIs, managed databases, or bringing your own cloud, the differences become clear.

See a quick comparison:

FeatureNorthflankModalBasetenTogether AI
GPU access (A100, H100, L4, etc.)Full range of GPU options, including cloud and BYOCServerless GPU jobsGPU access onlyGPU clusters with H100, GB200
Microservices & APIsBuilt-in supportBasic API runtimes onlyNot supportedManaged API endpoints
Databases (Postgres, Redis)Integrated managed servicesNot availableNot availableNo full-service DB support
BYOC supportFull self-service BYOC across AWS, GCP, AzureNot supported unless enterpriseNoEnterprise-only option
Secure multi-tenancyStrong isolation and RBAC supportLimited sandboxingUnknownLimited visibility
Jobs & Jupyter supportBackground jobs, scheduled tasks, notebooksJupyter + batch jobs onlyNot supportedJupyter and endpoints only
CI/CD & Git-native workflowsGit-based pipelines, preview environmentsMinimal integrationNot integratedBasic workflow support

Common questions about AI GPUs

Once you’ve seen how different platforms compare, you might still have a few lingering questions, particularly if you're trying to choose a GPU for the first time or running into VRAM bottlenecks. Let’s see a quick FAQ to clear up the most common questions developers ask.

  1. Which GPUs are best for AI?

    It depends on your workload. Use L4 for inference, A100 for fine-tuning, and H100 for large-scale model training.

  2. What is the most powerful AI GPU?

    NVIDIA's GB200 Grace Blackwell, designed for massive AI training and inference at scale, is currently the most powerful.

  3. Do you need a powerful GPU for AI?

    Only if you're training or fine-tuning large models. For inference, GPUs like L4 or A10G are usually enough.

  4. Which GPU is better for AI, NVIDIA or AMD?

    NVIDIA is preferred because of its CUDA ecosystem and better support across AI frameworks like PyTorch and TensorFlow.

  5. Which GPU is best for Stable Diffusion?

    A100 or RTX 4090. Models like SDXL and DreamBooth benefit from having at least 24GB of VRAM.

  6. How much RAM do you need for AI?

    It depends on the model size. A general rule is to have 3–4× the model’s parameter size in RAM to account for training overhead. For many use cases, 32GB of system RAM and 16–24GB of GPU VRAM is a good starting point.

  7. Why is NVIDIA best for AI?

    Tools like CUDA, cuDNN, and widespread framework support make it the default choice for most AI workloads.

  8. Does AI need CPUs or GPUs?

    Both. CPUs handle orchestration and I/O; GPUs handle model training and inference.

  9. What is the minimum GPU for deep learning?

    At least 16GB of VRAM. A10G or L4 GPUs are a practical starting point for small to medium workloads.

  10. Is 8GB of GPU enough for deep learning?

    It can handle small models or inference jobs, but you’ll likely hit memory limits during training.

  11. Do AI companies use GPUs?

    Yes, most AI companies rely on GPUs for both training and inference. Northflank supports running these across multiple cloud providers or on your own infrastructure.

Start running GPU workloads without the usual complexity

Remember that you don’t always need to manage your own hardware to train, fine-tune, or serve models at scale. If you're working with models like Qwen, DeepSeek, LLaMA, or Stable Diffusion, Northflank gives you an easier way to get started.

See what you can do:

  • Deploy in minutes (no local setup or manual provisioning)
  • Scale across clouds, use spot GPUs, or bring your own infrastructure
  • Run everything from CI pipelines to APIs, databases, notebooks, and AI jobs

Start building with GPUs on Northflank

Share this article with your network
X