Header image for blog post: Best platforms for untrusted code execution in 2026

Published 23rd March 2026

Best platforms for untrusted code execution in 2026

TL;DR: What are the best platforms for untrusted code execution in 2026?

Running untrusted code means executing logic you did not write, did not review, and cannot fully predict. AI-generated code, user-submitted scripts, and LLM tool calls all fall into this category. The platform you use determines whether a bad run stays contained or becomes a security incident.

Northflank – The broadest isolation lineup available: Kata Containers with Cloud Hypervisor, Firecracker, and gVisor applied per workload. Full-stack platform with BYOC, unlimited sessions, databases, and GPUs alongside sandboxes.
E2B – Purpose-built for AI agent code execution with Firecracker microVM isolation and clean Python and TypeScript SDKs. Strong isolation, 24-hour session cap on Pro.
Modal – gVisor isolation with massive autoscaling to 20,000 concurrent containers. Python-first, no BYOC, the right call for ML-heavy workloads.
Fly.io Sprites – Persistent Firecracker microVMs with 100GB NVMe storage and idle billing. Built for long-running agent environments, not high-volume ephemeral execution.

Why isolation is the central question for untrusted code execution

Not all code is equally risky to run. The code your engineers wrote is reviewed and trusted. Code a user submits, or an AI agent generates at runtime, is none of those things. It could access files it should not, make unauthorised network requests, consume unbounded resources, or exploit a kernel vulnerability to escape the execution environment entirely.

Standard containers share the host kernel, meaning a vulnerability inside one can affect the host and every other tenant. MicroVMs like Firecracker and Kata Containers give each workload its own dedicated kernel. gVisor intercepts system calls in user space without the full VM overhead. The platform you choose here is a security decision as much as an infrastructure one.

What should you look for in a platform for untrusted code execution?

These are the dimensions that matter most when running code you do not control.

Isolation model. Containers, gVisor, and microVMs provide different levels of protection. For genuinely untrusted code, microVM isolation with a dedicated kernel per workload is the right default. Container isolation is insufficient when the threat model includes kernel exploits.
Multi-tenant design. If multiple users or agents are running code on shared infrastructure, tenant isolation must be enforced by default. Verify that workloads from different tenants cannot share resources, kernel state, or filesystem access.
Network controls. Untrusted code should not be able to make arbitrary outbound network requests. Look for platforms that support default-deny egress policies, outbound firewall rules, and the ability to whitelist specific endpoints.
Resource limits. Runaway code can consume CPU, memory, and disk. Per-sandbox resource caps prevent a single bad run from affecting other workloads or running up an unexpected bill.
Lifecycle controls. Ephemeral environments that are destroyed after each run prevent state accumulation between executions. Persistent environments introduce the risk of one run contaminating the next.
Observability. Logs, metrics, and audit trails matter when something goes wrong. You need to know what the code did, what resources it accessed, and what network requests it made.

What are the best platforms for untrusted code execution?

1. Northflank

Northflank is a full-stack cloud platform with native support for untrusted code execution, accessible via UI, API, CLI, and GitOps. You define your sandbox environment once, specifying isolation model, storage, secrets, and lifecycle rules, then provision it from a CLI command, an API call, a Git trigger, or directly from an agent pipeline.

What separates Northflank for untrusted code specifically is the isolation choice: Kata Containers with Cloud Hypervisor, Firecracker, or gVisor, applied per workload based on your threat model. No other sandbox platform offers this breadth. Northflank's engineering team contributes to Kata Containers, QEMU, and Cloud Hypervisor upstream, so the isolation layer is actively maintained rather than bolted on.

Beyond isolation, Northflank is the only option here where sandboxes run alongside databases, APIs, background workers, and GPU workloads in the same control plane. Sessions run indefinitely with no platform-imposed cutoff. Any OCI-compliant image works without modification. BYOC deployment keeps execution inside your own AWS, GCP, Azure, Oracle, CoreWeave, Civo, on-premises, or bare-metal infrastructure, self-serve, no enterprise sales required.

cto.new migrated their entire sandbox infrastructure to Northflank in two days after EC2 metal instances made scaling costs unpredictable, going from unworkable provisioning to thousands of daily deployments with linear, per-second billing.

Key features:

Isolation options: Kata Containers with Cloud Hypervisor, Firecracker, and gVisor applied per workload. Every sandbox runs in its own microVM with true multi-tenant isolation.
Any OCI image: Accepts any container from Docker Hub, GitHub Container Registry, or private registries without modification. No SDK-defined image constraints.
No session limits: Sandboxes run for seconds or weeks with no platform-imposed cutoff. Ephemeral and persistent environments in the same control plane.
Network controls: Granular egress policies and per-sandbox resource limits to constrain what untrusted code can access and consume.
Full-stack scope: Run databases, persistent volumes, background jobs, and GPU workloads alongside your sandboxes.
Managed or BYOC: Run on Northflank’s managed cloud or self-serve deployment into your own AWS, GCP, Azure, Oracle, CoreWeave, Civo, on-premises, or bare-metal.
SOC 2 Type 2 certified: Relevant for regulated industries running untrusted code at scale.

Best for: Production multi-tenant platforms running untrusted or AI-generated code, teams that need isolation flexibility across Kata Containers, Firecracker, and gVisor, and anyone who needs a full infrastructure stack alongside their sandboxes.

Pricing: $0.01667/vCPU-hour, $0.00833/GB-hour, H100 GPU at $2.74/hour all-inclusive. BYOC deployments bill against your own cloud account.

Get started on Northflank (self-serve, no demo required). Or book a demo with an engineer to walk through your isolation requirements.

2. E2B

E2B is purpose-built for AI agent code execution, and Firecracker microVM isolation is the default. Each sandbox runs in a dedicated lightweight VM with its own kernel, providing hardware-level separation between untrusted code and the host. Boot times sit around 150ms and the Python and TypeScript SDKs integrate cleanly with LangChain, OpenAI, and Anthropic tooling.

The 24-hour session cap on Pro and one hour on Base limits E2B for longer-running workloads, but for the majority of untrusted code execution patterns, where each run is short and self-contained, E2B covers the isolation requirement well. BYOC is available but limited to AWS enterprise customers only.

Best for: AI coding agents, Code Interpreter-style tools, and teams that need fast Firecracker microVM isolation with a clean SDK and do not need sessions beyond 24 hours.

Pricing: Hobby free with $100 one-time credit and 20 concurrent sandboxes. Pro at $150/month with 100 concurrent sandboxes and 24-hour sessions.

Modal uses gVisor for sandbox isolation. gVisor intercepts system calls in user space, reducing direct interaction with the host kernel without the full overhead of running a separate VM per workload. It is not as strong as Firecracker or Kata Containers for untrusted code, but it is significantly stronger than standard container isolation and is sufficient for many production workloads.

Where Modal earns its place in this list is scale. It handles 20,000 concurrent containers with sub-second cold starts, and companies like Lovable and Quora run millions of executions through it. Environments are defined dynamically through Modal's Python SDK at runtime, which suits agent workloads that need to assemble execution environments programmatically. No BYOC option.

Best for: Python-first teams running high-volume untrusted code execution alongside ML inference or data pipelines, where gVisor-level isolation is sufficient for the threat model.

Pricing: Starter is free with $30/month in credits and 100 concurrent containers. Team at $250/month with 1,000 containers. Sandbox CPU at $0.1419/core/hr.

4. Fly.io Sprites

Sprites runs on Firecracker microVMs with 100GB persistent NVMe storage per sandbox and checkpoint/restore in around 300ms. The Firecracker isolation provides the same hardware-level kernel separation as E2B, which makes it a legitimate option for untrusted code where strong isolation is required. The idle billing model stops charging when environments are not in use, preserving state indefinitely.

Sprites is better suited to persistent, long-running agent environments than to high-volume ephemeral untrusted code execution. Sandbox creation takes one to twelve seconds, which is too slow for use cases that need to spin up many sandboxes quickly. There is no BYOC option, and the platform is early-stage compared to the others here.

Best for: Untrusted code workloads that need strong Firecracker isolation and persistent state between sessions, particularly for individual developers or teams already on Fly.io.

Pricing: $0.07/CPU-hour and $0.04375/GB-hour, no charge when idle.

Which platform should you choose for untrusted code execution?

The isolation model is the deciding factor. If you are running code from external users, AI agents, or any source you do not fully trust, microVM isolation with a dedicated kernel per workload is the right default. Containers are not sufficient.

Northflank gives you the most flexibility with Kata Containers, Firecracker, and gVisor selectable per workload, alongside the full infrastructure stack. E2B gives you Firecracker by default with the cleanest SDK experience. Fly.io Sprites gives you Firecracker with persistent state. Modal gives you gVisor with exceptional scale.

Platform	Isolation	Default for untrusted code	BYOC	Session limit
Northflank	Kata Containers, Firecracker, gVisor	Strong (microVM)	Yes, self-serve	Unlimited
E2B	Firecracker	Strong (microVM)	AWS and GCP only, enterprise	24 hours
Modal	gVisor	Moderate (user-space kernel)	No	None
Fly.io Sprites	Firecracker	Strong (microVM)	No	None

How do platforms for untrusted code execution compare on pricing?

Pricing as of April 2026. Billing models differ across platforms (some bill based on active CPU usage only, others bill for the entire duration the sandbox is running). Verify current rates on each platform's pricing page before making cost decisions.

Platform	CPU	Memory	Storage	GPU	Billing model
Northflank	$0.01667/vCPU-hr	$0.00833/GB-hr	$0.15/GB-month	L4: $0.80/hr, A100 40GB: $1.42/hr, A100 80GB: $1.76/hr, H100: $2.74/hr, H200: $3.14/hr	Per second
E2B	$0.0504/vCPU-hr	$0.0162/GiB-hr	10–20GB included free	Do not provide GPU compute	Per second
Fly.io Sprites	$0.07/CPU-hr	$0.04375/GB-hr	$0.00068/GB-hr (hot NVMe)	Do not provide GPU compute	Per second, actual cgroup usage. No charge when idle
Modal Sandboxes	$0.1419/physical core-hr (2 vCPU)	$0.0242/GiB-hr	—	L4: $0.80/hr, A100 40GB: $2.10/hr, A100 80GB: $2.50/hr, H100: $3.95/hr, H200: $4.54/hr	Per second

BYOC support across untrusted code execution platforms

The table below shows how each platform handles BYOC deployment, which clouds are supported, and whether it requires a sales process.

Platform	BYOC available	Clouds supported	Access model	Pricing model
Northflank	Yes, fully self-serve	AWS, GCP, Azure, Oracle, CoreWeave, any neoclouds, Civo, bare-metal, on-premises	Self-serve, enterprise contracts available for larger commits (with bulk discounts)	Your existing cloud bill, CPU $0.01389/vCPU-hr and Memory $0.00139/GB-hr
E2B	Yes, limited and not self-serve	AWS and GCP only	Not publicly disclosed, need to contact sales	Starts at $50/sandbox/month, on top of your existing cloud bill
Modal	No	Managed only	—	—
Fly.io Sprites	No	Managed only	—	—

FAQ: untrusted code execution platforms

What makes code untrusted?

Code is untrusted when you did not write it and cannot fully predict or audit what it will do at runtime. This includes user-submitted scripts, AI-generated code, LLM tool calls, and code from third-party plugins or integrations. Running untrusted code requires isolation strong enough to contain misbehaviour, whether intentional or accidental.

Why are containers not enough for untrusted code execution?

Containers share the host kernel using Linux namespaces and cgroups. A kernel vulnerability inside a container can allow an attacker to escape to the host and affect other tenants. MicroVMs give each workload a dedicated kernel, so a compromise inside the sandbox cannot reach the host kernel. For code you do not fully trust, that hardware boundary is the difference between a contained incident and a serious breach.

What is the difference between Firecracker and gVisor for untrusted code?

Firecracker runs each workload inside a lightweight VM with its own kernel, providing hardware-level isolation. gVisor intercepts system calls in user space and reimplements a subset of Linux kernel behaviour, reducing direct interaction with the host kernel without the overhead of a full VM. Firecracker provides stronger isolation; gVisor provides a lighter-weight middle ground between containers and full microVMs.

Can I run untrusted code in a multi-tenant system safely?

Yes, but only with the right isolation model. For multi-tenant untrusted code execution, you need microVM isolation so that one tenant's workload cannot affect another's kernel state or filesystem. You also need per-sandbox resource limits, network egress controls, and ephemeral environments that are destroyed after each run. Northflank, E2B, and Fly.io Sprites all provide microVM isolation by default for multi-tenant workloads.

What network controls should I apply to untrusted code?

At a minimum, apply a default-deny egress policy so sandboxed code cannot make arbitrary outbound requests. Whitelist only the specific endpoints the code needs to reach. Disable inbound connections unless required. Some platforms, like Northflank and Modal, expose granular network controls directly. Others require you to configure networking at the infrastructure level.

How do I prevent untrusted code from consuming unbounded resources?

Apply per-sandbox CPU, memory, and disk limits. Most sandbox platforms expose these controls at the API or configuration level. Set a maximum execution time to prevent infinite loops from running indefinitely. Northflank supports autoscaling with configurable resource thresholds and per-workload cost tracking.

Conclusion

Untrusted code execution is one of the few infrastructure decisions where getting the security model wrong has immediate and serious consequences. The isolation model determines whether a bad run stays contained inside a sandbox or escapes to affect your host, your other tenants, or your production systems. For production multi-tenant platforms running genuinely untrusted code, microVM isolation is the minimum bar. Containers are not.

Northflank is the strongest option for teams that need isolation flexibility, a full infrastructure stack, and no concurrency caps. E2B is the right call for teams that want Firecracker out of the box with a clean SDK. Fly.io Sprites suits long-running agent environments where persistent state matters. Modal covers gVisor at a scale few platforms can match.

You can get started for free on Northflank or talk to the team to walk through your untrusted code execution requirements.

If you want to go deeper on the topics covered in this guide, these articles are a good next step.

How to sandbox AI agents: microVMs, gVisor, and isolation strategies: Technical deep-dive into isolation technologies and how to choose between Firecracker, Kata Containers, and gVisor based on your threat model.
Self-hosted AI sandboxes: guide to secure code execution: Covers deployment models for teams that need execution inside their own infrastructure.
Remote code execution sandbox: secure isolation at scale: Architecture guide covering isolation models, security controls, and what production-grade untrusted code execution actually requires.

Share this article with your network