

How to run AI workloads on cloud GPUs (without buying hardware)
If you've worked with models like Qwen, DeepSeek, or LLaMA, you know different workloads push your GPU in different ways. Some need high memory to even start, others just need something that won't slow down during inference.
The challenge is this:
Getting access to the right GPU for your specific workload without spending thousands upfront on hardware, cooling, and maintenance.
That's where cloud GPU platforms like Northflank come in. You can run enterprise-grade GPUs, use your own cloud setup if you have one, and get all the tools to train, serve, and deploy your models in one place.
In this guide, I'll show you how to match your AI workload to the right cloud GPU setup and get started without owning the hardware yourself.
The key insight most developers have learned: VRAM matters more than clock speed for AI workloads. The more memory you have available, the better your models can run without hitting limits.
Quick workload matching:
- Inference: A100 or H100 for 8B-32B parameter models with sub-second latency
- Fine-tuning/PEFT: H100×8 or H200×8 for faster gradient sync
- Memory-intensive jobs: B200×8 for large models requiring massive memory
Why cloud GPUs make sense:
- Access A100, H100, H200, and B200 on demand
- Pay hourly (starting at $2.74/hour for H100)
- Full platform for training, serving, and deployment
- Bring your own cloud (BYOC) if you have existing infrastructure
With Northflank, you get enterprise GPU performance without upfront costs, cooling setup, or maintenance. Built for teams and solo developers who need speed, flexibility, and cost control.
Not every workload needs the same type of GPU. Here's how to choose the right cloud GPU configuration for what you're building:
For serving models in production or building APIs that need real-time responses:
- A100×1 (40GB): Serves 8B-parameter LLMs at approx 1,000 tokens/sec in FP16
- H100×1 (80GB): Boosts performance to approx 1,500 tokens/sec with optimized runtimes
- Scale up: Add more cards for larger context windows (32K tokens) or batch inference
When customizing open-source models or experimenting with parameter-efficient tuning:
- A100×8 (40GB): 320GB aggregate VRAM for medium models
- H100×8: 640GB with NVLink for larger base models
- H200×8: Enhanced tensor cores and bandwidth for reduced sync overhead
Training large models requires significant compute time. For an 8B-parameter transformer:
Configuration | Estimated Time | Best For |
---|---|---|
H100×8 | approx 2.85 years continuous | Research projects |
H200×8 | approx 2.3 years continuous | 20% faster with improved tensor cores |
B200×8 | approx 2.85 years continuous | Memory-intensive large batch training |
Reality check: Most teams fine-tune existing checkpoints rather than training from scratch due to time and cost requirements.
If you're working with specific open-source models, here's how to match them to cloud GPU setups:
Model | Task | Recommended Setup | Why This Works |
---|---|---|---|
Qwen 1.5 7B | Inference | A100×1-2, H100×1 | Fits in 80GB VRAM, sub-second responses |
DeepSeek Coder 6.7B | Fine-tuning | A100×4-8, H100×4-8 | Perfect for LoRA and adapter workflows |
LLaMA 3 8B | All stages | A100×2 (inference), 4-8 (tuning) | Flexible across different tasks |
Mixtral 8×7B | Fine-tuning | H100×4-8, H200×8 | Handles MoE gating and memory spikes |
Stable Diffusion XL | Inference/Fine-tuning | A100×2, H100×2 | Large image batches, fast sampling |
Whisper | Real-time inference | A100×1 | Low-latency audio processing |
Northflank goes beyond just providing GPU access - it's a complete platform designed for AI development workflows:
Immediate access:
- Deploy GPU workloads in under 30 minutes
- Switch between A100, H100, H200, and B200 as needs change
- Access through web interface, CLI, or API
Cost optimization:
- Hourly pricing with spot GPU options
- Automatic scaling up and down
- Resource isolation and usage tracking
- Hibernation for long-running jobs
Full development environment:
- Integrated databases (Postgres, Redis)
- CI/CD pipelines with Git integration
- Jupyter notebooks and development tools
- Templates for popular frameworks (PyTorch, TensorFlow)
Flexibility options:
- Use Northflank's managed cloud
- Bring your own cloud (AWS, GCP, Azure)
- Connect existing GPU infrastructure
- Automatic fallback when spot capacity runs out
While many platforms offer GPU access, Northflank provides a complete development environment:
Need | Northflank Solution | Alternative Platforms |
---|---|---|
Quick GPU access | A100, H100, H200, B200 on demand | Most provide basic GPU access |
Development tools | Integrated Jupyter, databases, APIs | Usually requires separate services |
Cost control | Spot pricing, auto-scaling, hibernation | Limited cost optimization |
Your own infrastructure | Full BYOC across all major clouds | Enterprise-only or not available |
Production deployment | Built-in CI/CD, monitoring, scaling | Requires additional tooling |
-
How much does it cost to run AI workloads on cloud GPUs?
Starting at $2.74/hour for H100 access, with spot pricing available for additional savings. You only pay for actual usage.
-
Can I bring my own cloud infrastructure?
Yes, Northflank supports BYOC across AWS, GCP, and Azure, letting you use existing credits or infrastructure while getting the platform benefits.
-
What if I need to scale beyond single GPUs?
Northflank handles multi-GPU setups automatically, with NVLink support for high-bandwidth communication between GPUs.
-
How quickly can I get started?
Most workloads can be deployed within 30 minutes, including environment setup and initial model deployment.
-
Do I need to manage infrastructure?
No, Northflank handles provisioning, scaling, monitoring, and maintenance automatically.
Instead of waiting weeks for hardware procurement or dealing with setup complexity, you can start developing with enterprise-grade GPUs immediately.
Get started with Northflank:
- Choose your GPU type based on your workload
- Deploy using templates or bring your existing code
- Scale automatically as your needs grow
- Pay only for what you use
Whether you're fine-tuning your first model or deploying production AI services, Northflank gives you the infrastructure you need without the operational overhead.