
Weights uses Northflank to scale to millions of users without a DevOps team
TL;DR
JonLuca DeCaro, ex-Citadel and Pinterest engineer, could have built his own infrastructure from scratch. Instead, he used Northflank to scale Weights into a multi-cloud, GPU-optimized AI platform serving millions.
With 9 clusters across AWS, GCP, and Azure, 40+ microservices, 250+ concurrent GPUs, 10,000+ AI training jobs and half a million inference runs per day, Weights operates at scale—and does it so seamlessly that most Series B+ startups wish they could be them.
Northflank automates everything—from container orchestration to workload scheduling—so a two-person team can run what would typically take an entire infra org.
The results: seamless cloud migration in hours instead of weeks, aggressive spot instance optimization, and a 7-minute model load time slashed to 55 seconds, cutting GPU costs dramatically.
For Weights, Northflank eliminated the need for Kubernetes management, CI/CD headaches, multi-cloud balancing, and endless DevOps overhead.
If JonLuca uses it, so should you. 🙂
Sometimes, the best engineering teams are the ones you don't need to hire.
That's what JonLuca DeCaro, founder of Weights, discovered when he turned to Northflank for infrastructure.
JonLuca isn't just any startup founder—he's a former Citadel and Pinterest engineer, where he built and scaled complex systems handling millions of users. If anyone could have built their own infra from scratch, it was him. Instead, he chose Northflank.
With only two engineers, they've built a consumer AI platform that serves millions of users—all without a dedicated DevOps team.
The problem
Scaling AI with constrained resources
In late 2023, Weights began as a local AI application for voice cloning. Their technical edge came from rewriting open-source AI models to run efficiently on consumer hardware, optimizing for edge inference rather than cloud deployment.
Users loved the performance, but they wanted more: a web version that could run on any device, including phones.
The transition from edge to cloud wasn't as much of a technical challenge as it was an existential one.
"We were a bootstrapped consumer startup. We faced this chicken-and-egg problem where we needed to monetize, but we couldn't until we launched and had no startup capital."
They had cloud credits but lacked the infrastructure expertise to leverage them effectively.
Switching from manual deployments to automated infrastructure
Phase 1: The Manual Era
Weights started out in a very hacky way.
"We were spinning up a single instance with a spot A100, SSH-ing in, doing a git pull, and starting services manually."
This approach worked for about a week before user demand exposed its limitations.
Phase 2: We need scalability!
As demand grew, they evaluated several options:
Self-managed Kubernetes clusters
Cloud-native deployment solutions
Managed container platforms
DevOps automation tools
Fractional DevOps consultants
Phase 3: ✨ Northflank ✨
"We wanted something that felt like Vercel for the backend. Where I can hook up my GitHub repo, write a single Dockerfile, and with one click, everything else just deploys. Autoscaling, builds, container registry, networking—everything just works."
The solution
Building a multi-cloud AI platform
"The average Series B startup doesn't have nine clusters across three separate clouds, Most startups wouldn't be able to reach this point without a full team of DevOps and deployment engineers. We're able to do it without one at all."
The infrastructure Weights built with Northflank is sophisticated yet manageable by a small team. Here's how it breaks down:
Architecture
9 clusters across AWS, GCP, and Azure
40+ microservices handling different AI workloads
250+ instances running simultaneously
Custom node pools for specific workload types
Integrated logging and monitoring systems
Workloads
10,000 daily AI training jobs
500,000 content creations per day
150TB monthly data transfer
Half a petabyte of user-generated content
Optimizing GPU
Weights implemented a sophisticated approach to GPU resource management:
Workloads designed for interruptibility and self-healing
Spot instance orchestration across clouds
VRAM-based GPU type selection
Time-slicing for optimal resource utilization
Multi-read-write cache layers for model loading
When you're paying by the minute for GPUs, every optimization counts.
"We cut our model loading time from 7 minutes to 55 seconds with Northflank's multi-read-write cache layer—that's direct savings on our GPU costs."
Infrastructure as Code (IaC)
First-class developer experience
The deployment workflow at Weights exemplifies modern DevOps practices without the overhead:
CI/CD pipeline
Code push → GitHub repository
Northflank build trigger analysis
Automated environment variable configuration
Docker build with optimized cache layers
Artifact registry push
Health check validation
Zero-downtime deployment
"The entire setup for launching a new service is probably five minutes. You point it to the Dockerfile, set the build rules and environment variables, click save, and then just don't think about it again."
There's more
As their platform evolved, Weights leveraged Northflank's ecosystem for additional capabilities:
Development workflow integration
TypeScript client for API automation
Template-based resource provisioning
Automated health checks and rollbacks
Cross-cluster resource orchestration
Integrated metrics and alerting
Operational tooling
Native job scheduling for 30-35 cron jobs
Datadog log aggregation and analysis
Redis deployment for acquired products
Custom node pool management
Cross-cloud resource balancing
The results (💸)
The cost benefits came in multiple forms.
1. Cloud flexibility
"Moving from Azure to GCP would typically be a massive migration—something you'd debate if it's worth 4-6 weeks of engineer time. With Northflank, it was an afternoon. We could one-click redeploy all our jobs and instantly leverage new cloud credits."
2. Spot instance optimization
Weights wrote their workloads to be interruptible and self-healing, then let Northflank handle the orchestration. This gave them a significant cost advantage through spot pricing.
3. Performance optimizations
A multi-read-write cache layer implemented through Northflank cut their model loading time from 7 minutes to 55 seconds—crucial savings when you're paying by the minute for GPUs.
4. Team size
"If we didn't have Northflank managing everything, just keeping track of the Kubernetes clusters, setting up registries, actually running all of it—I think it's three to five people at this point," JonLuca estimates.
Looking forward 🫡
"When you're a small seed-stage startup, the founder's time is invaluable. Any time spent fiddling with builds and DevOps pipelines is not spent building your product or finding product-market fit."
As they continue to scale, Weights is exploring advanced features like:
Automated spot market arbitrage across clouds
Enhanced cost optimization through usage analytics
Advanced performance monitoring and optimization
Cross-cluster resource sharing and balancing
Automated workload distribution based on regional demand
Focus on workloads, not infrastructure
"Speed is everything," JonLuca advises other startups.
"Now that something like Northflank exists, there's no reason not to use it. It'll let you move faster, figure out what your company is doing, save you money, and save you time."
For Weights, this meant transforming from a local AI app to serving millions of users across nine clusters—all while maintaining a lean, product-focused team.
That's the result of having infrastructure that WORKS.