Platform
Use cases
Resources
← Back to Blog
Published 6th February 2025

Weights uses Northflank to scale to millions of users without a DevOps team

TL;DR

JonLuca DeCaro, ex-Citadel and Pinterest engineer, could have built his own infrastructure from scratch. Instead, he used Northflank to scale Weights into a multi-cloud, GPU-optimized AI platform serving millions.

With 9 clusters across AWS, GCP, and Azure, 40+ microservices, 250+ concurrent GPUs, 10,000+ AI training jobs and half a million inference runs per day, Weights operates at scale—and does it so seamlessly that most Series B+ startups wish they could be them. 

Northflank automates everything—from container orchestration to workload scheduling—so a two-person team can run what would typically take an entire infra org.

The results: seamless cloud migration in hours instead of weeks, aggressive spot instance optimization, and a 7-minute model load time slashed to 55 seconds, cutting GPU costs dramatically.

For Weights, Northflank eliminated the need for Kubernetes management, CI/CD headaches, multi-cloud balancing, and endless DevOps overhead. 

If JonLuca uses it, so should you. 🙂

Sometimes, the best engineering teams are the ones you don't need to hire. 

That's what JonLuca DeCaro, founder of Weights, discovered when he turned to Northflank for infrastructure.

JonLuca isn't just any startup founder—he's a former Citadel and Pinterest engineer, where he built and scaled complex systems handling millions of users. If anyone could have built their own infra from scratch, it was him. Instead, he chose Northflank.

With only two engineers, they've built a consumer AI platform that serves millions of users—all without a dedicated DevOps team.

The problem

Scaling AI with constrained resources

In late 2023, Weights began as a local AI application for voice cloning. Their technical edge came from rewriting open-source AI models to run efficiently on consumer hardware, optimizing for edge inference rather than cloud deployment. 

Users loved the performance, but they wanted more: a web version that could run on any device, including phones.

The transition from edge to cloud wasn't as much of a technical challenge as it was an existential one. 

"We were a bootstrapped consumer startup. We faced this chicken-and-egg problem where we needed to monetize, but we couldn't until we launched and had no startup capital." 

They had cloud credits but lacked the infrastructure expertise to leverage them effectively.

Switching from manual deployments to automated infrastructure

Phase 1: The Manual Era

Weights started out in a very hacky way.

"We were spinning up a single instance with a spot A100, SSH-ing in, doing a git pull, and starting services manually." 

This approach worked for about a week before user demand exposed its limitations.

Phase 2: We need scalability!

As demand grew, they evaluated several options:

  • Self-managed Kubernetes clusters

  • Cloud-native deployment solutions

  • Managed container platforms

  • DevOps automation tools

  • Fractional DevOps consultants

Phase 3: ✨ Northflank ✨

"We wanted something that felt like Vercel for the backend. Where I can hook up my GitHub repo, write a single Dockerfile, and with one click, everything else just deploys. Autoscaling, builds, container registry, networking—everything just works."

The solution

Building a multi-cloud AI platform

"The average Series B startup doesn't have nine clusters across three separate clouds, Most startups wouldn't be able to reach this point without a full team of DevOps and deployment engineers. We're able to do it without one at all."

The infrastructure Weights built with Northflank is sophisticated yet manageable by a small team. Here's how it breaks down:

Architecture 

  • 9 clusters across AWS, GCP, and Azure

  • 40+ microservices handling different AI workloads

  • 250+ instances running simultaneously

  • Custom node pools for specific workload types

  • Integrated logging and monitoring systems


Workloads

  • 10,000 daily AI training jobs

  • 500,000 content creations per day

  • 150TB monthly data transfer

  • Half a petabyte of user-generated content

Optimizing GPU 

Weights implemented a sophisticated approach to GPU resource management:

  1. Workloads designed for interruptibility and self-healing

  2. Spot instance orchestration across clouds

  3. VRAM-based GPU type selection

  4. Time-slicing for optimal resource utilization

  5. Multi-read-write cache layers for model loading

When you're paying by the minute for GPUs, every optimization counts.

"We cut our model loading time from 7 minutes to 55 seconds with Northflank's multi-read-write cache layer—that's direct savings on our GPU costs."

Infrastructure as Code (IaC)

First-class developer experience

The deployment workflow at Weights exemplifies modern DevOps practices without the overhead:

CI/CD pipeline

  1. Code push → GitHub repository

  2. Northflank build trigger analysis

  3. Automated environment variable configuration

  4. Docker build with optimized cache layers

  5. Artifact registry push

  6. Health check validation

  7. Zero-downtime deployment

"The entire setup for launching a new service is probably five minutes. You point it to the Dockerfile, set the build rules and environment variables, click save, and then just don't think about it again."

There's more

As their platform evolved, Weights leveraged Northflank's ecosystem for additional capabilities:

Development workflow integration

  • TypeScript client for API automation

  • Template-based resource provisioning

  • Automated health checks and rollbacks

  • Cross-cluster resource orchestration

  • Integrated metrics and alerting

Operational tooling

  • Native job scheduling for 30-35 cron jobs

  • Datadog log aggregation and analysis

  • Redis deployment for acquired products

  • Custom node pool management

  • Cross-cloud resource balancing

The results (💸)

The cost benefits came in multiple forms.

1. Cloud flexibility

"Moving from Azure to GCP would typically be a massive migration—something you'd debate if it's worth 4-6 weeks of engineer time. With Northflank, it was an afternoon. We could one-click redeploy all our jobs and instantly leverage new cloud credits."

2. Spot instance optimization

Weights wrote their workloads to be interruptible and self-healing, then let Northflank handle the orchestration. This gave them a significant cost advantage through spot pricing.

3. Performance optimizations

A multi-read-write cache layer implemented through Northflank cut their model loading time from 7 minutes to 55 seconds—crucial savings when you're paying by the minute for GPUs.

4. Team size

"If we didn't have Northflank managing everything, just keeping track of the Kubernetes clusters, setting up registries, actually running all of it—I think it's three to five people at this point," JonLuca estimates.

Looking forward 🫡

"When you're a small seed-stage startup, the founder's time is invaluable. Any time spent fiddling with builds and DevOps pipelines is not spent building your product or finding product-market fit."

As they continue to scale, Weights is exploring advanced features like:

  • Automated spot market arbitrage across clouds

  • Enhanced cost optimization through usage analytics

  • Advanced performance monitoring and optimization

  • Cross-cluster resource sharing and balancing

  • Automated workload distribution based on regional demand

Focus on workloads, not infrastructure

"Speed is everything," JonLuca advises other startups. 

"Now that something like Northflank exists, there's no reason not to use it. It'll let you move faster, figure out what your company is doing, save you money, and save you time."

For Weights, this meant transforming from a local AI app to serving millions of users across nine clusters—all while maintaining a lean, product-focused team. 

That's the result of having infrastructure that WORKS.

Share this article with your network
X