← Back to Blog
Header image for blog post: What are AWS Spot Instances? Guide to lower cloud costs and avoid downtime
Deborah Emeni
Published 27th June 2025

What are AWS Spot Instances? Guide to lower cloud costs and avoid downtime

I saw a Reddit thread the other day asking if anyone uses AWS Spot Instances, and most of the replies were “yeah, but only for jobs I don’t mind getting interrupted.”

That’s understandable because Spot Instances (on AWS and other clouds) are cheaper than regular ones. AWS claims it is up to 90% cheaper than On-Demand, and similar pricing models exist on Google Cloud and Azure.

However, the cheap pricing comes with a condition: AWS, for example, can shut them down with just a couple of minutes’ notice whenever the capacity is needed elsewhere.

So for a lot of teams, it ends up feeling more like a gamble than a tool.

The question isn’t if Spot Instances have potential. It’s about using them without breaking things when providers like AWS shut an instance down with little warning.

That’s what this article walks through. You’ll learn:

  • What Spot Instances are (in plain terms)
  • Why they cost less, and how that affects your setup
  • How Spot Instances compare to On-Demand, Reserved Instances, and Savings Plans
  • What kinds of workloads are a good fit (and when to avoid them)
  • How Spot interruptions work across cloud providers
  • How teams run workflows on Spot Instances without writing failover logic
  • And a step-by-step guide to running Spot Instances using Northflank, with fallback to On-Demand

TL;DR: Spot Instances at a glance

In case you’re running out of time, here’s a quick summary of what this article covers

  • What are Spot Instances?

    Spot Instances are spare EC2 capacity that AWS sells at up to 90% off, but they can be shut down at any time.

    THE SOLUTION → Platforms like Northflank handle automated fallback from Spot to On-Demand, so your workloads keep running without interruption.

  • Why are they cheaper than On-Demand or Reserved?

    They run on idle AWS infrastructure with no uptime guarantee.

  • When should you use Spot Instances?

    They’re great for flexible workloads: ML training, CI/CD pipelines, batch jobs, and rendering.

  • How do Spot Instance interruptions work?

    AWS gives a 2-minute warning before reclaiming the instance. Without a fallback strategy, anything running on the instance stops immediately.

  • What are the best practices for using Spot safely?

    Use Auto Scaling, Spot Fleets, flexible instance types, and logic to fall back when needed.

  • How does Northflank help?

    Northflank lets you assign specific workloads to Spot Instances and automatically falls back to On-Demand when capacity runs out, across clouds and regions, without requiring custom fallback scripts or additional tooling.

What are Spot Instances?

Spot Instances are spare EC2 virtual machines that AWS isn’t using right now. They rent them out at a steep discount, sometimes up to 90% off. It’s a clever way for AWS to make use of idle capacity while giving you access to cheaper compute.

So yes, it’s cheap, but at what cost?

AWS can reclaim a Spot Instance whenever it needs that capacity elsewhere. You typically get a 2-minute warning before shutdown, though some providers or setups may give as little as 30 seconds. Either way, the interruption is sudden.

You can picture it like this: it’s like grabbing a table at a restaurant that isn’t fully booked. You’re seated quickly and pay less, but if someone with a reservation shows up, you’re getting bumped.

See the illustration below to get a clearer picture of how Spot Instances fit into the broader EC2 capacity model:

Diagram showing EC2 capacity split into Reserved Instances (priority), On-Demand Instances (middle priority), and Spot Instances (lowest priority, reclaimable by AWS)How AWS allocates EC2 capacity across Reserved, On-Demand, and Spot Instances.

That’s why Spot Instances are best suited for:

  • flexible jobs like ML training or batch data processing
  • stateless tasks
  • short-lived or fault-tolerant workloads

PS: You definitely don’t want to run your main production database on it.

Spot is risky for stateful workloads or critical production services like APIs. If the instance is preempted, your app can go offline with little warning.

What’s the difference between Spot, On-Demand, and Reserved Instances?

Now that you know how Spot Instances work, it's helpful to compare them with other EC2 pricing models.

AWS doesn’t change the infrastructure underneath; it's only the cost, availability, and level of commitment that change.

Choosing the right one comes down to your type of workload, reliability needs, and budget.

Let’s see a breakdown to help you decide:

TypePriceUptime guaranteeCommitmentInstance flexibilityBest use cases
Spot InstancesUp to 90% cheaper than On-DemandNoNone (availability depends on long-term supply and demand)Limited to instance types and capacity currently availableML training, large-scale batch processing, CI pipelines, rendering jobs, other fault-tolerant or short-lived workloads
On-DemandStandard pay-as-you-go rateYesNone (you only pay while the instance is running)Fully flexible - launch any available instance at any timeAPIs, web servers, short-term dev/test, unpredictable usage patterns
Reserved~40–72% cheaper than On-Demand (depending on term/payment)Yes1–3 year term (no change mid-way)Limited - tied to a specific instance family, region, and OSConsistent production workloads (e.g. backend services, databases) with stable usage
Savings PlansSimilar savings to Reserved, but more flexibleYes1–3 year $/hour commitment (flexible usage)High - applies across EC2, Fargate, and LambdaApps with steady usage but less predictable infrastructure setup (e.g. hybrid container + serverless)

How Spot Instance pricing and discounts work across AWS, GCP, and Azure

If you're planning to use Spot Instances as part of your cost strategy, it's helpful to understand how pricing and long-term discounts differ across cloud providers.

The table below compares how AWS, GCP, and Azure approach Spot pricing and flexible commitment models.

Cloud providerSpot Instance nameDiscounted commitment optionHow it worksFlexibility
AWSSpot InstancesSavings Plans / Reserved InstancesSpot Instances use unused capacity with up to 90% discount. Savings Plans offer cost reduction across EC2, Fargate, and Lambda. Reserved Instances lock in usage for specific instance types and regions for 1–3 years.Savings Plans are flexible across instance types and services. Reserved Instances are locked to family, region, and OS.
GCPSpot VMs (formerly Preemptible VMs)Committed Use Discounts (CUDs) / FlexCUDsSpot VMs offer up to ~80% off standard pricing and can be interrupted anytime. FlexCUDs provide 45% off with fewer restrictions than standard CUDs.FlexCUDs apply across VM types and regions. Standard CUDs are tied to specific SKUs.
AzureSpot Virtual MachinesAzure Savings Plans / Reserved InstancesAzure Spot lets you run workloads on unused capacity at a discount. Azure Savings Plans commit to a $/hour spend across services for 1–3 years. Reserved Instances are tied to specific VM types and regions.Azure Savings Plans offer more flexibility than Reserved VMs, but less than on-demand.

When should you use Spot Instances (and when should you avoid them)?

It depends on how much your workload can tolerate interruptions and if it needs to keep running without delays or downtime.

Let’s break it down.

When Spot Instances are useful

Spot Instances are useful when you’re running:

  • Machine learning training: These jobs often run in parallel across multiple nodes and can handle interruptions by restarting or picking up from checkpoints. Training large models is compute-heavy, and Spot helps you save significantly on cost.
  • CI/CD pipelines: Build and test jobs are short-lived and stateless. If a node goes away, the pipeline can rerun or retry the job without major consequences. Spot works well here since you’re not relying on long-term uptime.
  • Rendering: Tasks like 3D rendering, video encoding, or animation can be broken into smaller, parallel workloads. If one part fails or gets interrupted, it can be requeued without affecting the rest.
  • Batch jobs and analytics: Large data processing jobs, such as log aggregation, report generation, or ETL pipelines, can be split up and retried. These jobs don’t need to be always-on and are usually tolerant to retries or delays.

When you should avoid using Spot Instances

Spot isn’t the best fit when you’re running workloads that need guaranteed uptime or can’t afford to be interrupted in the middle of execution. For example:

  • Stateful apps: If your application relies on session data, caches, or keeps user state in memory, interruptions can break the experience or cause data loss. These apps need consistent availability, which Spot doesn’t guarantee.
  • Databases: Running a primary database on Spot is risky unless you’ve built in high availability or replication. Losing a database node mid-query or during a transaction can corrupt data or trigger downtime.
  • Customer-facing services without redundancy: If your app or API is directly serving users and you don’t have replicas or failover strategies, interruptions can lead to visible outages. Spot is too unpredictable unless you’ve built in the layers to recover instantly.

NOTE

If your workload can fail without causing issues, Spot is an advisable cost-saving move. If not, use On-Demand or pair Spot with a fallback strategy using platforms like Northflank so you're not caught off guard when AWS reclaims capacity.

How an AI company used Northflank to run Spot Instances reliably (Must-read)

Earlier, we looked at how Spot Instances can work well for flexible workloads, as long as you have a way to recover when AWS takes the capacity back. Without that, teams often end up running into unexpected job failures.

That was the case for an AI voice company called Weights.

They were running GPU-heavy machine learning jobs that were well-suited for Spot: parallel tasks that didn’t need to run continuously.

The setup made sense on paper, but losing capacity in the middle of a run kept getting in the way.

With Northflank, they simply tagged jobs to Spot pools, and the platform handled the rest.

When Spot capacity was reclaimed, jobs automatically moved to On-Demand without needing manual configuration or infrastructure code.

It also worked across different cloud providers and zones, so they didn’t need to worry about availability gaps.

How to use Northflank to run jobs on Spot Instances with automatic fallback (step-by-step)

The Weights team didn’t have to write fallback logic from scratch. They used built-in scheduling features in Northflank to reliably run Spot jobs without maintaining additional scripts or infrastructure tooling.

You can apply the same approach to your own cluster by following the steps below.

Northflank makes Spot Instances practical by giving you control over where workloads run and how they respond when Spot capacity becomes unavailable. You can assign cost-sensitive jobs to Spot node pools, rely on automatic fallback to On-Demand, and scale across regions or clouds without writing custom failover scripts.

Follow the steps outlined below.

1. Start by creating a Northflank account

If you’re new to the platform, go to app.northflank.com/signup to get started.

Once you're signed in, the Introduction to Northflank guide explains how projects, services, jobs, and environments work so you can understand how Northflank fits into your infrastructure setup.

Northflank signup screen with fields for username, email, and password, and options to sign up with Google, GitHub, or GitLabSign up for a Northflank account to get started with Spot workload scheduling

2. Connect your Kubernetes cluster

To start using Spot Instances, you’ll need to bring your own cluster from AWS, GCP, or another cloud provider.

Northflank guides you through connecting your cluster and installing the required components to enable deployments. Once set up, you can securely deploy builds, jobs, or services to your infrastructure.

Follow the workload deployment guide to:

  • Connect your cluster to Northflank
  • Set up access to your cloud provider
  • Schedule workloads to your nodes

3. Create separate node pools for Spot and On-Demand

Inside your cluster settings in the Northflank UI, you can create dedicated node pools.

For the Spot node pool:

  • Select Spot-capable instance types
  • Enable the "Use spot instances" setting
  • Optionally restrict the pool to only allow builds or jobs

For the On-Demand node pool:

  • Leave the Spot option disabled
  • Allow long-running services and fallback jobs to run here

This is what it looks like when setting up a Spot node pool in Northflank:

Northflank cluster settings screen showing node pool creation with Spot Instances enabled and scheduling options visibleCreating a dedicated node pool with Spot Instances enabled in the Northflank UI

You can follow the full node pool creation guide for more detail.

4. Add labels to your node pools

To control where workloads run, add labels to your node pools during creation. Labels are defined as name–value pairs and help guide workload scheduling decisions. For example:

  • resourceType: highCPU for compute-optimized pools
  • availabilityZone: 1a to assign workloads to a specific zone

You can set as many labels as you need by expanding the Advanced section when configuring your node pool in the Northflank UI.

Once labels are in place, you can tag your workloads with matching values to influence placement. For example, a job tagged with resourceType: highCPU will be scheduled onto a node pool with the same label.

The screenshot below shows how to configure a node pool with a resourceType: highCPU label in the Northflank UI:

Northflank interface showing node pool settings with a label named resourceType and value highCPULabel configuration in the Northflank UI for targeting workloads to a specific node pool

You can find more details on how to label node pools and influence workload scheduling in the full node pool labeling guide.

5. Tag your workloads to run on Spot pools

Once your node pools are labeled, head to your workload settings and tag the jobs you want to run on Spot instances.

These tags will match against your Spot pool’s labels. Northflank then attempts to deploy those jobs to the Spot node pool first. If there’s no available capacity, it will fall back to a matching On-Demand pool.

See how to deploy workloads using node pool tags and labels

Nice Work! If you followed these steps and used the docs to guide setup, your team now has a reliable way to run cost-sensitive jobs on Spot instances with fallback to On-Demand.

Meaning that your team no longer has to worry about managing failover logic or handling unexpected Spot interruptions manually.

Should you start using Spot Instances? (Wrapping up)

If your workloads can handle interruptions, Spot Instances are one of the simplest ways to lower cloud costs.

At the same time, you might still need the stability of On-Demand Instances, particularly for production workflows or time-sensitive jobs.

Platforms like Northflank let you use both Spot and On-Demand. It assigns cost-sensitive workloads to Spot Instances and automatically switches to On-Demand when capacity runs out.

You don’t need to write fallback scripts or set up custom infrastructure, and it works across regions and clouds.

Try it on Northflank if you want to run Spot Instances with less manual setup and more reliability.

FAQs about Spot Instances (12 questions answered)

You'll find answers to some of the most commonly asked questions about Spot Instances.

  1. What is a Spot Instance?

    A Spot Instance is a virtual machine that uses unused cloud capacity, most commonly in AWS. It comes at a lower price but can be interrupted at any time when the provider needs that capacity back. Platforms like Northflank let you use Spot Instances for jobs or builds, with automatic fallback to On-Demand instances if capacity runs out.

  2. What is a Spot Instance in simple words?

    It's a cheaper cloud server you can use when there's spare capacity, but the provider can take it back anytime. With platforms like Northflank, you don’t need to handle that interruption logic yourself.

  3. How much cheaper are Spot Instances?

    They can be up to 90% cheaper than On-Demand instances, depending on the region and instance type. Northflank helps you take advantage of this by assigning cost-sensitive workloads to Spot node pools automatically.

  4. Are Spot Instances worth it?

    Yes, especially if you're running flexible or retryable workloads like CI pipelines, batch jobs, or machine learning training. Northflank supports these kinds of jobs with automatic rescheduling if a Spot instance is reclaimed.

  5. How long do Spot Instances last?

    There’s no fixed duration. A Spot Instance might run for hours or get interrupted after a few minutes. Platforms like Northflank let you schedule jobs with fallback to ensure they still run even if a Spot pool becomes unavailable.

  6. Can Spot Instances be stopped?

    Yes. You can stop or terminate them from your side, but the provider (like AWS) can also terminate them with a short warning. With Northflank, if this happens, your jobs are automatically retried on an On-Demand pool.

  7. What is the difference between Spot Instances and On-Demand Instances?

    On-Demand Instances are reliable and uninterrupted, but come at a higher cost. Spot Instances are cheaper but can be reclaimed at any time. Northflank helps you combine both: use Spot first, then fall back to On-Demand only if needed.

  8. What is the difference between Spot Instances and Reserved Instances?

    Reserved Instances are long-term commitments (1–3 years) with guaranteed availability at a discounted rate. Spot Instances are short-term and interruptible, but much cheaper. Platforms like Northflank are better suited for Spot use cases where you want low cost without long commitments.

  9. What is the difference between Spot Instances and Savings Plans?

    Savings Plans give you a discount in exchange for a steady usage commitment over time. Spot Instances have no commitment, but pricing and availability can change quickly. Northflank doesn’t require you to lock in a plan; you just choose how to schedule workloads.

  10. Are Spot Instances cheaper than Reserved Instances?

    Yes, usually. Spot Instances offer bigger discounts because they’re not guaranteed to run continuously. Reserved Instances are more stable, but cost more than Spot. Northflank lets you choose what’s best for your jobs based on priority and budget.

  11. How to set up Spot Instances?

    You can configure them directly through AWS or by using orchestration platforms like Northflank, which lets you assign labels and tags to route workloads to Spot or On-Demand pools with minimal setup.

  12. What are the risks of Spot Instances?

    The main risk is that they can be interrupted at any time. Without a fallback mechanism, your jobs could fail or be delayed. Platforms like Northflank help reduce this risk by automatically shifting jobs to On-Demand pools when Spot capacity isn’t available.

Share this article with your network
X