Header image for blog post: How Cedana uses Northflank to deploy workloads onto Kubernetes with microVMs and secure runtimes

Published 15th May 2025

How Cedana uses Northflank to deploy workloads onto Kubernetes with microVMs and secure runtimes

TL;DR

Cedana is building live migration and snapshot/restore infrastructure for GPU-heavy workloads, with applications ranging from resilient cloud infrastructure to on-prem clusters plagued by high GPU failure rates.

Founded by Niranjan Ravichandra and Neel Master, who come from aerospace and robotics backgrounds, the startup emerged out of YC in 2023 and is deeply technical, fully remote, and almost entirely made up of engineers.

They chose Northflank to avoid the burden of managing infra manually, run production deployments for customers, and spin up complex Kubernetes environments with microVMs and secure runtime (including Kata Containers and Cloud Hypervisor), which Northflank supports out of the box for enterprise customers.

As a result, they now deploy customer environments in one click and test secure runtime workloads, avoid vendor lock-in from PubSub and RDS, and ship infrastructure tools faster.

The problem

Infrastructure-level resilience in a GPU-constrained world

"We started back in 2023, with the idea of making systems more resilient," said Niranjan, CTO and co-founder of Cedana. "I come from an aerospace background and Neel comes from a long robotics background as well."

Both founders had firsthand experience with large, mission-critical systems and the rigorous standards of uptime and fault tolerance required in those domains. The software world, especially in cloud and AI infrastructure, didn’t reflect that same level of rigor.

"Given the high failure rate that we've both experienced in our careers, we thought it’d be great to take some of those learnings and bring them down to earth. Literally."

They initially envisioned a resilience platform, but it quickly evolved: "What we're building now is effectively live migration as a service."

Live migration is notoriously difficult, especially for GPUs. Cedana targets two customer segments:

Infrastructure providers: cloud platforms or "neo-clouds" looking for dynamic compute migration
On-prem users: orgs running their own clusters who need ways to minimize downtime and manage hardware failure, especially GPU-related

"GPUs fail at a higher rate than anything else. And as we increase with successive generations, the failure rate just gets harder and harder."

The pace of hardware churn also makes operationalization hard:

"Coupled with the pressures of trying to get a new fleet of GPUs in every year or so, it makes it very difficult for smaller organizations that are building on-prem clusters to manage their GPUs efficiently."

GPU snapshot and restore offers an alternative to full live migration. In practice, Cedana supports both.

"We become a proxy to the weights on the GPU itself. So companies are using us to circumvent the need to manage weights themselves. And because we capture all the runtime state, the cold start time is like two to 10 times faster."

They're also testing secure compute use cases:

"We've also been dipping our toes into the world of Kata and Cloud Hypervisor for confidential and secure computing as well."

Live migration, snapshot restore, and confidential computing is a potent stack. But getting there requires a platform that doesn’t fight you.

CleanShot 2025-05-16 at 15.15.45.gif

The solution

From internal prototyping to full customer environments

Cedana discovered Northflank while trying to avoid reinventing the internal tooling they'd once had at larger companies.

"At my last company, which was acquired by Shopify, we had a lot of nice internal tooling that abstracted away some of the complexities of the cloud. We were a GCP shop, and we just didn’t have to worry about it."

Northflank gives teams a self-serve, first-class platform they don’t have to build from scratch or maintain. It covers what internal tools usually do, but without the overhead. Most teams, startups or enterprises, are better off focusing their engineers on the product, not on infrastructure toil.

Hiring engineers at Cedana meant Niranjan needed a way to offer a frictionless development environment:

"Stumbled upon Northflank, and it just kind of smoothed over a lot of the rough edges I was anticipating with working in the cloud with the team."

The Cedana team uses Northflank in two key ways:

1. Testing custom infrastructure components

One engineer is using Northflank for testing Kata-based workloads:

"One of our engineers, for example, makes use of the fact that you can deploy Northflank with Kata containers in GCP, and is just using that for our Kata cloud hypervisor checkpoint restore testing."

Thanks to cluster lifecycle automation:

"Northflank manages the cluster, creates it on our behalf, and then we can either choose to deploy things via Northflank or just kubectl apply on our end."

They also SSH into nodes directly to test secure runtimes:

"The cluster's been created with Kata and Cloud Hypervisor already working on it, and [an engineer] just SSHs into the nodes and messes with them directly."

2. Deploying customer-facing infrastructure

Cedana uses Northflank to host production environments, not just staging or test:

"We do serve production customers through Northflank. So we have a template that we deploy for every customer, effectively, that we define in Northflank with a couple of microservices, a Postgres database... incredibly easy to set up."

With each new pilot or POC:

"All I have to do is just reapply that template."

They also use Northflank's add-ons for RabbitMQ:

"Previously, we were playing around with using Google PubSub... a couple of customers came to us and asked us for a self-hosting solution. So we decided to kind of rip that out and switch to RabbitMQ."

This was made possible because of Northflank's flexibility:

"A nice benefit of all of this is that Northflank is not prescriptive in how they want you to deploy stuff. You can just run any container you want."

"If I want to just helm install my Helm chart onto a cluster, I can. At the end of the day, it's just Kubernetes."

The results

Production-grade infrastructure without a platform team

No support overhead: "I've almost never had to contact support... And on the occasions that I did, super responsive."
Multi-cloud and on-prem portability: "Northflank isn’t just avoiding cloud lock-in, it's avoiding service lock-in."
Isolated, production-grade clusters by default: Cedana runs secure Kubernetes workloads in microVMs with full sandboxing and runtime isolation. Provisioning, scaling, and teardown are fully managed, no custom scripting required.
Enterprise-readiness out of the box: "Things that you would want that feel like defaults and should be given, they're just there. Like MFA support, things like that SOC 2 requires."
SOC 2 compliance with minimal lift: Cedana successfully completed their SOC 2 Type I audit using Northflank and are now deep in their Type II process. "We needed things like audit logs and MFA support—Northflank already had them built-in. I was literally just taking screenshots of the platform for our auditors."

Even their customer delivery model is evolving around Northflank:

"The next step... is kind of take this model that Northflank has let us build out, package that into a couple of Helm charts, and just give that to customers. It will look like they have a Northflank deployment inside their own clusters."

Final thoughts

"I never hit a strange guardrail. In a video game where it’s fake open world, you'll run into a wall. It's kind of similar with a lot of other platforms. But here, it just works."

Most platforms promise flexibility, but often hit you with invisible walls once you try anything remotely advanced (like deploying your own Helm charts, or manually configuring a GPU runtime). Cedana didn’t want a “pretend open world” where you’re nudged back to the path the platform thinks you should take.

With Northflank, they didn’t encounter these walls. Whether it’s launching a GPU-enabled Kubernetes cluster, SSH-ing into nodes, or testing Cloud Hypervisor inside a VM, everything worked as expected.

Northflank acts like a game engine with great defaults and a modding API. You can click-to-deploy and get up and running fast, but if you want to go deep and build your own systems, it’s all there too. That’s why Cedana can go from proof-of-concept to production, without rewriting how they work.

Share this article with your network

Will Stewart • 15th April 2025

Ultralight ditched AWS ECS for EKS with Northflank. Here’s why.

Ultralight is an early-stage company building software that helps medical device companies navigate FDA approvals.They started on AWS ECS, but deployments were slow, debugging was painful, and compliance was a nightmare.

Case Study

Will Stewart • 6th February 2025

Weights uses Northflank to scale to millions of users without a DevOps team

With 9 clusters across AWS, GCP, and Azure, 40+ microservices, 250+ concurrent GPUs, and 10,000+ AI training jobs per day, Weights operates at scale—and does it so seamlessly that most Series B+ startups wish they could be them.

Case Study

Also from the blog