
Weights uses Northflank to scale to millions of users without a DevOps team
TL;DR
JonLuca DeCaro, ex-Citadel and Pinterest engineer, could have built his own infrastructure from scratch. Instead, he used Northflank to scale Weights into a multi-cloud, GPU-optimized AI platform serving millions.
With 9 clusters across AWS, GCP, and Azure, 40+ microservices, 250+ concurrent GPUs, 10,000+ AI training jobs and half a million inference runs per day, Weights operates at scale—and does it so seamlessly that most Series B+ startups wish they could be them.
Northflank automates everything—from container orchestration to workload scheduling—so a two-person team can run what would typically take an entire infra org.
The results: seamless cloud migration in hours instead of weeks, aggressive spot instance optimization, and a 7-minute model load time slashed to 55 seconds, cutting GPU costs dramatically.
For Weights, Northflank eliminated the need for Kubernetes management, CI/CD headaches, multi-cloud balancing, and endless DevOps overhead.
If JonLuca uses it, so should you. 🙂
Sometimes, the best engineering teams are the ones you don't need to hire.
That's what JonLuca DeCaro, founder of Weights, discovered when he turned to Northflank for infrastructure.
JonLuca isn't just any startup founder—he's a former Citadel and Pinterest engineer, where he built and scaled complex systems handling millions of users. If anyone could have built their own infra from scratch, it was him. Instead, he chose Northflank.
With only two engineers, they've built a consumer AI platform that serves millions of users—all without a dedicated DevOps team.
In late 2023, Weights began as a local AI application for voice cloning. Their technical edge came from rewriting open-source AI models to run efficiently on consumer hardware, optimizing for edge inference rather than cloud deployment.
Users loved the performance, but they wanted more: a web version that could run on any device, including phones.
The transition from edge to cloud wasn't as much of a technical challenge as it was an existential one.
"We were a bootstrapped consumer startup. We faced this chicken-and-egg problem where we needed to monetize, but we couldn't until we launched and had no startup capital."
They had cloud credits but lacked the infrastructure expertise to leverage them effectively.
Weights started out in a very hacky way.
"We were spinning up a single instance with a spot A100, SSH-ing in, doing a git pull, and starting services manually."
This approach worked for about a week before user demand exposed its limitations.
As demand grew, they evaluated several options:
-
Self-managed Kubernetes clusters
-
Cloud-native deployment solutions
-
Managed container platforms
-
DevOps automation tools
-
Fractional DevOps consultants
"We wanted something that felt like Vercel for the backend. Where I can hook up my GitHub repo, write a single Dockerfile, and with one click, everything else just deploys. Autoscaling, builds, container registry, networking—everything just works."
"The average Series B startup doesn't have nine clusters across three separate clouds, Most startups wouldn't be able to reach this point without a full team of DevOps and deployment engineers. We're able to do it without one at all."
The infrastructure Weights built with Northflank is sophisticated yet manageable by a small team. Here's how it breaks down:
-
9 clusters across AWS, GCP, and Azure
-
40+ microservices handling different AI workloads
-
250+ instances running simultaneously
-
Custom node pools for specific workload types
-
Integrated logging and monitoring systems
-
10,000 daily AI training jobs
-
500,000 content creations per day
-
150TB monthly data transfer
-
Half a petabyte of user-generated content
Weights implemented a sophisticated approach to GPU resource management:
-
Workloads designed for interruptibility and self-healing
-
Spot instance orchestration across clouds
-
VRAM-based GPU type selection
-
Time-slicing for optimal resource utilization
-
Multi-read-write cache layers for model loading
When you're paying by the minute for GPUs, every optimization counts.
"We cut our model loading time from 7 minutes to 55 seconds with Northflank's multi-read-write cache layer—that's direct savings on our GPU costs."
The deployment workflow at Weights exemplifies modern DevOps practices without the overhead:
-
Code push → GitHub repository
-
Northflank build trigger analysis
-
Automated environment variable configuration
-
Docker build with optimized cache layers
-
Artifact registry push
-
Health check validation
-
Zero-downtime deployment
"The entire setup for launching a new service is probably five minutes. You point it to the Dockerfile, set the build rules and environment variables, click save, and then just don't think about it again."
As their platform evolved, Weights leveraged Northflank's ecosystem for additional capabilities:
-
TypeScript client for API automation
-
Template-based resource provisioning
-
Automated health checks and rollbacks
-
Cross-cluster resource orchestration
-
Integrated metrics and alerting
-
Native job scheduling for 30-35 cron jobs
-
Datadog log aggregation and analysis
-
Redis deployment for acquired products
-
Custom node pool management
-
Cross-cloud resource balancing
The cost benefits came in multiple forms.
"Moving from Azure to GCP would typically be a massive migration—something you'd debate if it's worth 4-6 weeks of engineer time. With Northflank, it was an afternoon. We could one-click redeploy all our jobs and instantly leverage new cloud credits."
Weights wrote their workloads to be interruptible and self-healing, then let Northflank handle the orchestration. This gave them a significant cost advantage through spot pricing.
A multi-read-write cache layer implemented through Northflank cut their model loading time from 7 minutes to 55 seconds—crucial savings when you're paying by the minute for GPUs.
"If we didn't have Northflank managing everything, just keeping track of the Kubernetes clusters, setting up registries, actually running all of it—I think it's three to five people at this point," JonLuca estimates.
"When you're a small seed-stage startup, the founder's time is invaluable. Any time spent fiddling with builds and DevOps pipelines is not spent building your product or finding product-market fit."
As they continue to scale, Weights is exploring advanced features like:
-
Automated spot market arbitrage across clouds
-
Enhanced cost optimization through usage analytics
-
Advanced performance monitoring and optimization
-
Cross-cluster resource sharing and balancing
-
Automated workload distribution based on regional demand
"Speed is everything," JonLuca advises other startups.
"Now that something like Northflank exists, there's no reason not to use it. It'll let you move faster, figure out what your company is doing, save you money, and save you time."
For Weights, this meant transforming from a local AI app to serving millions of users across nine clusters—all while maintaining a lean, product-focused team.
That's the result of having infrastructure that WORKS.