← Back to Blog
Header image for blog post: How to run AI workloads on cloud GPUs (without buying hardware)
Deborah Emeni
Published 10th September 2025

How to run AI workloads on cloud GPUs (without buying hardware)

If you've worked with models like Qwen, DeepSeek, or LLaMA, you know different workloads push your GPU in different ways. Some need high memory to even start, others just need something that won't slow down during inference.

The challenge is this:

Getting access to the right GPU for your specific workload without spending thousands upfront on hardware, cooling, and maintenance.

That's where cloud GPU platforms like Northflank come in. You can run enterprise-grade GPUs, use your own cloud setup if you have one, and get all the tools to train, serve, and deploy your models in one place.

In this guide, I'll show you how to match your AI workload to the right cloud GPU setup and get started without owning the hardware yourself.

TL;DR: Match your AI workload to cloud GPUs + start in minutes

The key insight most developers have learned: VRAM matters more than clock speed for AI workloads. The more memory you have available, the better your models can run without hitting limits.

Quick workload matching:

  • Inference: A100 or H100 for 8B-32B parameter models with sub-second latency
  • Fine-tuning/PEFT: H100×8 or H200×8 for faster gradient sync
  • Memory-intensive jobs: B200×8 for large models requiring massive memory

Why cloud GPUs make sense:

  1. Access A100, H100, H200, and B200 on demand
  2. Pay hourly (starting at $2.74/hour for H100)
  3. Full platform for training, serving, and deployment
  4. Bring your own cloud (BYOC) if you have existing infrastructure

With Northflank, you get enterprise GPU performance without upfront costs, cooling setup, or maintenance. Built for teams and solo developers who need speed, flexibility, and cost control.

Match your AI workload to the right cloud GPU

Not every workload needs the same type of GPU. Here's how to choose the right cloud GPU configuration for what you're building:

Running inference (production APIs, real-time responses)

For serving models in production or building APIs that need real-time responses:

  • A100×1 (40GB): Serves 8B-parameter LLMs at approx 1,000 tokens/sec in FP16
  • H100×1 (80GB): Boosts performance to approx 1,500 tokens/sec with optimized runtimes
  • Scale up: Add more cards for larger context windows (32K tokens) or batch inference

Fine-tuning and PEFT (LoRA, adapters, customization)

When customizing open-source models or experimenting with parameter-efficient tuning:

  • A100×8 (40GB): 320GB aggregate VRAM for medium models
  • H100×8: 640GB with NVLink for larger base models
  • H200×8: Enhanced tensor cores and bandwidth for reduced sync overhead

Full model training (when you need to train from scratch)

Training large models requires significant compute time. For an 8B-parameter transformer:

ConfigurationEstimated TimeBest For
H100×8approx 2.85 years continuousResearch projects
H200×8approx 2.3 years continuous20% faster with improved tensor cores
B200×8approx 2.85 years continuousMemory-intensive large batch training

Reality check: Most teams fine-tune existing checkpoints rather than training from scratch due to time and cost requirements.

If you're working with specific open-source models, here's how to match them to cloud GPU setups:

ModelTaskRecommended SetupWhy This Works
Qwen 1.5 7BInferenceA100×1-2, H100×1Fits in 80GB VRAM, sub-second responses
DeepSeek Coder 6.7BFine-tuningA100×4-8, H100×4-8Perfect for LoRA and adapter workflows
LLaMA 3 8BAll stagesA100×2 (inference), 4-8 (tuning)Flexible across different tasks
Mixtral 8×7BFine-tuningH100×4-8, H200×8Handles MoE gating and memory spikes
Stable Diffusion XLInference/Fine-tuningA100×2, H100×2Large image batches, fast sampling
WhisperReal-time inferenceA100×1Low-latency audio processing

Getting started with Northflank for AI workloads

Northflank goes beyond just providing GPU access - it's a complete platform designed for AI development workflows:

Immediate access:

  • Deploy GPU workloads in under 30 minutes
  • Switch between A100, H100, H200, and B200 as needs change
  • Access through web interface, CLI, or API

Cost optimization:

  • Hourly pricing with spot GPU options
  • Automatic scaling up and down
  • Resource isolation and usage tracking
  • Hibernation for long-running jobs

Full development environment:

  • Integrated databases (Postgres, Redis)
  • CI/CD pipelines with Git integration
  • Jupyter notebooks and development tools
  • Templates for popular frameworks (PyTorch, TensorFlow)

Flexibility options:

  • Use Northflank's managed cloud
  • Bring your own cloud (AWS, GCP, Azure)
  • Connect existing GPU infrastructure
  • Automatic fallback when spot capacity runs out

Platform comparison: Why Northflank for AI workloads

While many platforms offer GPU access, Northflank provides a complete development environment:

NeedNorthflank SolutionAlternative Platforms
Quick GPU accessA100, H100, H200, B200 on demandMost provide basic GPU access
Development toolsIntegrated Jupyter, databases, APIsUsually requires separate services
Cost controlSpot pricing, auto-scaling, hibernationLimited cost optimization
Your own infrastructureFull BYOC across all major cloudsEnterprise-only or not available
Production deploymentBuilt-in CI/CD, monitoring, scalingRequires additional tooling

Common questions about running AI on cloud GPUs

  1. How much does it cost to run AI workloads on cloud GPUs?

    Starting at $2.74/hour for H100 access, with spot pricing available for additional savings. You only pay for actual usage.

  2. Can I bring my own cloud infrastructure?

    Yes, Northflank supports BYOC across AWS, GCP, and Azure, letting you use existing credits or infrastructure while getting the platform benefits.

  3. What if I need to scale beyond single GPUs?

    Northflank handles multi-GPU setups automatically, with NVLink support for high-bandwidth communication between GPUs.

  4. How quickly can I get started?

    Most workloads can be deployed within 30 minutes, including environment setup and initial model deployment.

  5. Do I need to manage infrastructure?

    No, Northflank handles provisioning, scaling, monitoring, and maintenance automatically.

Start running your AI workloads today

Instead of waiting weeks for hardware procurement or dealing with setup complexity, you can start developing with enterprise-grade GPUs immediately.

Get started with Northflank:

  • Choose your GPU type based on your workload
  • Deploy using templates or bring your existing code
  • Scale automatically as your needs grow
  • Pay only for what you use

Whether you're fine-tuning your first model or deploying production AI services, Northflank gives you the infrastructure you need without the operational overhead.

Start building with GPUs on Northflank →

Share this article with your network
X