

Self-hosted AI sandboxes: Guide to secure code execution in 2026
Self-hosted AI sandboxes are isolated execution environments that run AI-generated code within your own infrastructure rather than relying on third-party managed services.
Why companies choose self-hosted sandboxes:
- Maintain data sovereignty and meet compliance requirements (GDPR, HIPAA, SOC2)
- Reduce latency by running sandboxes on the same network as LLM infrastructure
- Control costs at scale as managed sandbox pricing becomes unsustainable
- Keep sensitive data within their own security perimeter
Three paths to self-hosted sandboxes:
- BYOC (Bring Your Own Cloud) platforms like Northflank: Managed orchestration deploying directly into your AWS, GCP, Azure, Oracle, Civo, CoreWeave, or bare-metal infrastructure with production-ready microVM isolation
- Fully managed services (E2B, Modal): Quick start but data leaves your infrastructure
- Open-source DIY (Firecracker, Kata Containers): Maximum control but requires months of engineering investment
Most enterprises find BYOC offers the ideal balance: you get self-hosted infrastructure control with sovereignty guarantees, without the operational burden of building and maintaining complex sandbox systems from scratch.
Companies move from managed sandbox services to self-hosted AI sandboxes to maintain control over their infrastructure, data, and costs. This guide covers the three deployment options, decision criteria for each, and what implementation involves.
Self-hosted AI sandboxes are secure, isolated environments for executing AI-generated code that run on infrastructure you own or control, rather than on a vendor's shared multi-tenant platform.
Unlike managed sandbox services where your code executes on someone else's servers, self-hosted sandboxes deploy directly into your cloud account, on-premises data center, or private infrastructure.
The core difference comes down to where the compute runs and who controls the data:
- Self-hosted sandboxes with BYOC: Platforms like Northflank manage the control plane (orchestration, monitoring, updates) while the data plane (actual compute and execution) runs in your infrastructure. You get managed operations with self-hosted control.
- Managed AI sandboxes: Code-execution-as-a-service running in vendor's shared multi-tenant infrastructure. Best for prototyping and low-security workloads where compliance isn't a concern.
- Self-hosted AI sandboxes: Sovereign execution runtimes with isolation technology (Firecracker microVMs, gVisor, Kata Containers) running in your infrastructure. Best for production-scale agents, PII-handling, and regulated industries where data cannot leave your VPC.
This isn't just about running containers on your own servers. AI sandboxes require isolation beyond standard containers to safely execute untrusted code. Standard Docker containers share the host kernel, creating security vulnerabilities when running code generated by LLMs that might contain bugs, hallucinations, or prompt-injection attacks.
Three critical barriers are forcing engineering teams to move sandbox infrastructure in-house:
For fintech, healthcare, and government sectors, regulatory demands make managed sandboxes non-viable. When your AI agent processes customer financial data or patient health records, that data cannot leave your VPC without triggering GDPR, HIPAA, or SOC2 violations.
Managed sandbox APIs act as third-party data processors, requiring complex data processing agreements and often disqualifying you from certain enterprise contracts.
Managed sandbox APIs also run in shared multi-tenant environments where your workloads execute alongside other customers' code, creating potential cross-tenant data exposure risks that compliance auditors scrutinize. Self-hosted sandboxes keep PII within your security perimeter, simplifying compliance audits and maintaining data sovereignty.
Real-time AI applications can't afford the round-trip time to external sandbox services. When your agent needs to execute code to answer a user question, 200-500ms of network latency to a managed API breaks the conversational flow.
Self-hosting sandboxes on the same network as your LLM inference reduces execution latency to near-zero. For AI coding assistants, data analysis tools, or autonomous agents making rapid decisions, this performance difference is the gap between "feels instant" and "feels broken."
Managed providers charge premium pricing for convenience. Early-stage usage costs are manageable, but as you grow to millions of code executions monthly, the markup becomes unsustainable.
For instance, cto.new hit this inflection point during their launch week. Thousands of daily deployments made managed sandbox costs prohibitive. By moving to self-hosted infrastructure with Northflank's BYOC platform, they gained cost predictability and economics that scaled with their growth.
| Factor | Managed sandboxes | Self-hosted / BYOC |
|---|---|---|
| Compliance | Third-party processor (high risk) | In-VPC residency (low risk) |
| Latency | Network round-trip (200ms+) | Local network (near-zero) |
| Cost at scale | Per-execution pricing (expensive) | Infrastructure-based (predictable) |
| Data control | Vendor infrastructure | Your infrastructure |
As we've covered in our analysis of the best code execution sandboxes for AI agents, the choice isn't just about features. It's about where your trust boundary lies and who controls your infrastructure.
When evaluating self-hosted AI sandbox solutions, you're choosing between three approaches, each with distinct tradeoffs:
| Approach | Infrastructure control | Operational burden | Best for |
|---|---|---|---|
| BYOC (Bring Your Own Cloud) Platform (Northflank) | High (your cloud account) | Low (managed control plane) | Production scale, compliance-driven, enterprise |
| Managed SaaS (E2B, Modal, Daytona) | Low (vendor's infrastructure) | None | Early-stage, testing, proof-of-concept |
| Open-Source DIY (Firecracker, microsandbox) | Total (you manage everything) | Very High | Unique requirements, extreme customization |
The three paths to self-hosted AI sandboxes differ in the level of infrastructure control you get versus the amount of operational work you take on.
BYOC represents the pragmatic middle ground for self-hosted sandboxes. Platforms like Northflank provide managed orchestration while deploying compute into your AWS, GCP, Azure, Oracle, Civo, CoreWeave, or on-premises infrastructure.
You get production-ready sandbox infrastructure with microVM isolation technologies (Kata Containers with Cloud Hypervisor, gVisor, Firecracker) running in your VPC. Data never leaves your infrastructure, thereby meeting compliance requirements, while Northflank handles orchestration, networking, scaling, and Day 2 operations.
This approach solves the self-hosting dilemma: you maintain sovereignty without building and maintaining complex sandbox infrastructure from scratch.
Platforms like E2B, Modal, and Daytona handle all infrastructure, offering simple APIs for code execution. You trade control for convenience. Great for validating product-market fit, but the barriers mentioned above eventually force migration.
For teams with specific requirements that no platform addresses, open-source tools offer maximum flexibility:
- Firecracker: AWS's microVM technology, sub-200ms boot times, hardware isolation
- Microsandbox: Experimental self-hosted platform with MicroVM support and MCP integration
The reality: building production-grade self-hosted sandbox infrastructure requires 6-12 months of dedicated engineering work. You're responsible for isolation technology, orchestration, networking security, monitoring, patching, and scaling. Most teams underestimate this complexity.
Not every team needs self-hosted infrastructure. Use this framework to determine if self-hosting is right for your situation:
| Scenario | Self-host now | Consider self-hosting | Stay with managed |
|---|---|---|---|
| Compliance | HIPAA, GDPR, FedRAMP requirements mandate data in your VPC | Enterprise customers asking security questions | No regulatory requirements |
| Data Sensitivity | Processing customer PII, financial records, health data | Handling proprietary business logic | Public or non-sensitive data |
| Scale | Over 1 million monthly executions | 100k to 1 million monthly executions | Under 100k monthly executions |
| Latency | Need under 50ms response times for real-time agents | 100 to 200ms acceptable | Over 500ms acceptable |
| Infrastructure | Have dedicated platform engineering team | Can allocate 1 to 2 engineers | No infrastructure capacity |
| Deployment | Enterprise requires on-premises or private cloud | Prefer infrastructure control | Speed to market critical |
If you're building self-hosted AI sandbox infrastructure from scratch, understanding the full scope prevents costly surprises down the line.
- Isolation layer: Choose between Firecracker microVMs (strongest isolation, AWS-proven), gVisor (user-space kernel interception, Google-developed), or Kata Containers (container UX with VM security). This isn't just running Docker. You need dedicated kernels per execution.
- Orchestration system: Something must manage thousands of ephemeral sandbox lifecycles, handle scheduling, and ensure resource efficiency. Kubernetes with Kata runtime classes works, but requires significant hardening for untrusted code.
- Networking security: Implement default-deny egress policies so AI agents can't exfiltrate data or scan internal networks. You'll need granular controls for which sandboxes can access external APIs versus remaining completely air-gapped.
- API gateway: Your LLM application needs secure methods to submit code, stream execution output, retrieve results, and handle errors. This layer manages authentication, rate limiting, and routing to available sandbox capacity.
- Monitoring and observability: When a sandbox execution fails or gets compromised, you need detailed logging, metrics, and tracing to diagnose issues without exposing sensitive data.
The DIY path could demand 2-3 senior infrastructure engineers working 3-6 months minimum, plus ongoing maintenance. BYOC platforms like Northflank handle this complexity while giving you infrastructure control. You get production-ready self-hosted sandboxes in weeks instead of months.
For technical implementation details, see our guides on spinning up secure microVMs and sandboxing AI agents.
As AI models become more capable and generate increasingly complex code, the security and compliance risks of third-party code execution grow proportionally. Enterprises building serious AI applications can't afford to send sensitive data to external sandbox APIs.

Self-hosted AI sandboxes, through BYOC platforms or DIY infrastructure, ensure your innovation never compromises your security. The question isn't if you'll need self-hosted sandboxes, but when the transition makes strategic and economic sense for your team.
Get started with self-hosted sandbox infrastructure through Northflank's BYOC deployment to get production-grade self-hosted sandboxes running in your cloud account, or go into our technical guide on secure sandbox architecture and implementation.
Yes, but standard Kubernetes pods aren't secure for untrusted AI code. You need runtime classes like Kata Containers for VM-level isolation. Self-hosted sandbox platforms on Kubernetes require specialized runtimes, resource quotas, and network policies to prevent sandbox escape.
BYOC (Bring Your Own Cloud) is a type of self-hosted deployment where the vendor manages the control plane while sandboxes run in your infrastructure. Pure self-hosting means you operate everything. Platforms like Northflank use BYOC to give you data sovereignty while handling orchestration and operations.
Self-hosted costs depend on your approach. DIY requires months of engineering work plus ongoing maintenance. BYOC platforms like Northflank remove this upfront work. You pay for compute resources while the platform manages infrastructure. At scale, self-hosted options typically cost less than managed per-execution pricing.
MicroVMs (Firecracker, Kata Containers) provide the strongest isolation with dedicated kernels. gVisor offers good security with lower overhead. Standard Docker containers aren't sufficient due to shared kernel vulnerabilities. Choose based on your security needs.
Yes. Self-hosted sandboxes keep data in your VPC with proper isolation, network policies, and audit logging. However, compliance also requires documented security policies, access controls, encryption, and regular audits.
Implement strict resource quotas: CPU limits, memory caps, disk I/O restrictions, and timeouts. Northflank's architecture makes these limits configurable per sandbox. DIY implementations need resource restrictions and monitoring.