Header image for blog post: Best sandbox runners for AI agents and code execution in 2026

Published 27th March 2026

Best sandbox runners for AI agents and code execution in 2026

TL;DR: Best sandbox runners in 2026

Sandbox runners are isolated execution environments for running code safely, whether from AI agents, user submissions, or untrusted scripts. The right one depends on what isolation model you need, whether your workload requires persistent state, and how much infrastructure you want the platform to handle.

Sandbox runners range from container-based environments to microVM-backed platforms with hardware-level isolation. The isolation model determines how safely you can run untrusted code at scale.
For teams building AI products, the most important evaluation criteria are isolation strength, ephemeral vs persistent support, BYOC (Bring Your Own Cloud) availability, and whether the platform covers the full workload runtime alongside sandbox execution.
Northflank provides production-grade sandbox infrastructure backed by Firecracker, Kata Containers, and gVisor, with both ephemeral and persistent environments and no forced time limits, self-serve BYOC across AWS, GCP, Azure, Oracle, CoreWeave, Civo, bare-metal, and on-premises infrastructure, SOC 2 Type 2 compliance, on-demand GPU support, and a full workload runtime for APIs, workers, databases, and jobs alongside sandboxes. Northflank has been running this class of workload in production since 2021 across startups, public companies, and government deployments.

Sandbox runners cover a wider range of tools than most comparisons acknowledge, from browser-based execution environments to production-grade microVM platforms running at scale.

This article compares the best sandbox runners in 2026 across isolation model, persistence, BYOC (Bring Your Own Cloud) support, GPU access, and platform scope, so you can match the right one to your use case.

What is a sandbox runner?

A sandbox runner is an isolated execution environment where code runs without affecting your host system, other tenants, or production infrastructure. The isolation boundary determines the security model: standard Linux containers share the host kernel and rely on namespace separation, while microVMs (Firecracker, Kata Containers) give each workload a dedicated kernel, and gVisor intercepts system calls in user space to reduce the kernel attack surface.

Sandbox runners are used for AI agent code execution, user-submitted scripts, code interpretation, and any workload where you cannot trust the code being run. See what is a sandbox environment? and what is an AI sandbox? for deeper breakdowns.

What should you look for in a sandbox runner?

The evaluation criteria depend on your workload, but these questions are worth working through before committing to a platform:

What isolation model does it use? Containers, gVisor, and microVMs offer meaningfully different security guarantees. For untrusted code at scale, microVM-level isolation provides a dedicated kernel per workload. See Kata Containers vs Firecracker vs gVisor for a technical breakdown.
Does it support both ephemeral and persistent environments? Ephemeral environments handle stateless, short-lived execution. Persistent environments let agents maintain state across sessions without rebuilding from scratch. Many platforms support only one model.
Can it deploy inside your own infrastructure? If workloads cannot leave your network for compliance or data residency reasons, check whether BYOC (Bring Your Own Cloud) is available self-serve or only on an enterprise plan requiring a sales process.
Does it cover the full workload runtime? A sandbox API is not the same as a full platform. If you need databases, background workers, GPU jobs, and production services alongside sandbox execution, check whether the platform covers all of this or only code execution.
Does it support GPU workloads? Not all sandbox runners include GPU access. Confirm this is available within the same platform if your workloads require it.
What compliance certifications does it hold? SOC 2, HIPAA, and GDPR coverage varies across platforms.

What are the best sandbox runners in 2026?

Each platform below takes a different approach to sandbox execution. Here is what they provide and where they fit.

1. Northflank

Northflank provides production-grade sandbox infrastructure backed by Firecracker, Kata Containers, and gVisor, with orchestration, multi-tenant isolation, autoscaling, and bin-packing handled at the infrastructure level. It is the only platform in this list that covers sandboxed code execution alongside production deployments, databases, and GPU workloads in one control plane.

Key capabilities:

Firecracker, Kata Containers, and gVisor applied depending on the workload
Both ephemeral and persistent environments with no forced time limits
End-to-end sandbox creation at 1-2 seconds, covering the full stack
Self-serve BYOC (Bring Your Own Cloud) across AWS EKS, GKE, AKS, Oracle Kubernetes, CoreWeave, Civo, bare-metal, and on-premises distributions including OpenShift and RKE2, or run on Northflank's managed cloud
On-demand GPU access (NVIDIA H100, A100, L4, and others) with no quota requests
Full workload runtime: APIs, workers, databases, and background jobs run alongside sandboxes in the same control plane
API, CLI, and SSH access
Multi-tenant architecture
SOC 2 Type 2 certified, in production since 2021 across startups, public companies, and government deployments
CPU at $0.01667/vCPU-hour, memory at $0.00833/GB-hour. See full GPU and compute pricing

Cost comparison at scale

To make the pricing difference concrete, here is what 200 sandboxes costs across providers under the same conditions.

Based on 200 sandboxes, plan: nf-compute-100-4, infra node: m7i.2xlarge

Model	Provider	Cloud	Sandbox vendor	Total
PaaS	Northflank	—	$7,200.00	$7,200.00
PaaS	E2B	—	$16,819.20	$16,819.20
PaaS	Modal	—	$24,491.50	$24,491.50
PaaS	Fly Sprites	—	$35,770.00	$35,770.00
PaaS	Vercel Sandbox	—	$31,068.80	$31,068.80
BYOC (0.2 overcommit)*	Northflank	$1,500.00	$560.00	$2,060.00
BYOC	E2B	$1,500.00	$10,000.00	$11,500.00

*Through Northflank's plans on BYOC, there's a default overcommit which allows a customer to spawn more services and sandboxes on the same amount of compute. A request modifier of 0.2 means each sandbox only requests 20% of its plan's resources as a guaranteed minimum, but can burst up to the full plan limit if there's available capacity on the node. So instead of fitting 8 sandboxes per node, you could fit 40 on the same hardware, reducing both infrastructure cost and the Northflank management fee.

Northflank is the right choice when you need isolation guarantees beyond containers, want to avoid managing separate infrastructure for execution and production, or require workloads to stay within your own cloud under compliance constraints.

Next steps:

2. E2B

E2B provides isolated sandbox environments for AI agents and code execution, with Python and JavaScript SDKs.

Key capabilities:

Isolated Linux VMs created on demand via API
Pause and resume with full state preserved (filesystem and memory)
Paused sandboxes retained indefinitely with no automatic deletion
Continuous runtime limit of 24 hours (Pro) or 1 hour (Base) per session, reset on pause and resume
AutoResume for automatic sandbox resumption on network reconnection
Snapshots for saving and restoring sandbox state
SSH access, interactive terminal, proxy tunneling, and custom domain support
Git integration and cloud storage bucket connectivity
MCP gateway
BYOC available on Enterprise for AWS and GCP only (requires contacting sales)

Modal is a serverless compute platform with a sandbox interface for executing untrusted or dynamically defined code.

Key capabilities:

gVisor-based sandbox isolation
Sandbox environments defined and spawned at runtime with custom container images
Sandbox timeouts configurable up to 24 hours, with Filesystem Snapshots for longer workflows
GPU access configurable per sandbox
Tunnels for direct external connections and granular egress network policies
Filesystem snapshots for state preservation and restoration
Python SDK (primary), JavaScript and Go SDKs
No BYOC deployment option

4. Fly.io Sprites

Sprites are persistent, hardware-isolated Linux environments built on Fly.io's infrastructure.

Key capabilities:

Firecracker microVM isolation per Sprite
Persistent ext4 filesystem backed by NVMe hot storage during execution and durable object storage at rest
Sprites create in approximately 1-2 seconds
Automatic idle behaviour: compute charges stop when idle, filesystem is preserved
Warm and cold states: warm Sprites resume quickly from hibernation
Checkpoints with copy-on-write (approximately 300ms, non-disruptive to the running environment)
Unique HTTPS URL per Sprite for exposing services or APIs
Up to 8 vCPUs and 16GB RAM per Sprite
CLI, JavaScript, and Go SDKs
No BYOC

5. Vercel Sandbox

Vercel Sandbox provides on-demand, isolated microVM environments for running untrusted code, tightly integrated with Vercel's deployment infrastructure.

Key capabilities:

Firecracker microVM isolation
Node.js 22 and Python 3.13 runtimes on Amazon Linux 2023
Session limits: 5 minutes default, up to 45 minutes on Hobby, up to 5 hours on Pro and Enterprise
Snapshotting for saving and restoring sandbox state
Up to 8 vCPUs and 2GB RAM per vCPU
Active CPU billing only (billed when code is actively running)
TypeScript and Python SDKs, CLI
Runs on Vercel's infrastructure only, no BYOC

6. Cloudflare Sandbox

Cloudflare Sandbox provides isolated Linux container environments for running untrusted code, built on Cloudflare Containers and Durable Objects. It is currently in Beta and available on the Workers Paid plan.

Key capabilities:

Isolated Linux containers (not microVMs), each with a full Ubuntu environment
State is maintained while the container is active; state resets after inactivity (10 minutes by default, configurable)
Python and Node.js code interpreter with persistent execution contexts while active
Docker-in-Docker support
Preview URLs via automatic subdomain routing
WebSocket support for real-time streaming
Browser terminal access
S3-compatible object storage mounting (R2, S3, GCS) for persistence across sessions
TypeScript SDK
Integrates with Cloudflare Workers, R2, KV, and Workers AI
No BYOC

7. CodeSandbox

CodeSandbox, now part of Together AI, provides microVM-based sandbox environments for AI agents, code interpretation, and developer workflows.

Key capabilities:

MicroVM infrastructure with snapshot and restore
VM restore within 2 seconds
Sandbox state persistence across sessions via snapshots
Customisable hibernation periods
CodeSandbox SDK for programmatic sandbox management
Supports AI agents, development environments, code interpretation, and CI/CD
No BYOC

Which sandbox runner fits your use case?

The right platform depends on your primary requirement. Use the table below to narrow down your options.

If you need...	Consider...
MicroVM isolation (Firecracker, Kata Containers, or gVisor) with self-serve BYOC (Bring Your Own Cloud)	Northflank
Both ephemeral and persistent environments with no forced time limits	Northflank
Full workload runtime alongside sandboxes (databases, APIs, workers, GPU)	Northflank
On-demand GPU support within the same platform as sandboxes	Northflank
SOC 2 Type 2 compliance with self-serve BYOC deployment	Northflank
API-driven sandbox execution with pause, resume, and AutoResume	E2B
gVisor-based isolation with runtime-defined environments and GPU access	Modal
Persistent Linux environments with automatic idle behaviour and checkpointing	Fly.io Sprites
Short-lived Firecracker microVM execution within the Vercel ecosystem	Vercel Sandbox
Container-based sandboxes integrated with Cloudflare's developer platform	Cloudflare Sandbox
MicroVM sandboxes with snapshot and restore for AI agents and code playgrounds	CodeSandbox

How do sandbox runners compare on pricing?

Pricing as of April 2026. Billing models differ across platforms (some bill based on active CPU usage only, others bill for the entire duration the sandbox is running). Verify current rates on each platform's pricing page before making cost decisions.

Platform	CPU	Memory	Storage	GPU	Billing model
Northflank	$0.01667/vCPU-hr	$0.00833/GB-hr	$0.15/GB-month	L4: $0.80/hr, A100 40GB: $1.42/hr, A100 80GB: $1.76/hr, H100: $2.74/hr, H200: $3.14/hr	Per second
Fly.io Sprites	$0.07/CPU-hr	$0.04375/GB-hr	$0.00068/GB-hr (hot NVMe)	Do not provide GPU compute	Per second, actual cgroup usage. No charge when idle
Cloudflare Sandbox	$0.072/vCPU-hr	$0.009/GiB-hr	$0.000252/GB-hr	Do not provide GPU compute	Active CPU; memory and disk provisioned. Requires $5/month Workers Paid plan
CodeSandbox	$0.075/core-hr (credit-based: $0.015/credit)	Bundled with VM tier	Included	Do not provide GPU compute	Credit-based ($0.015/credit)
E2B	$0.0504/vCPU-hr	$0.0162/GiB-hr	10–20GB included free	Do not provide GPU compute	Per second
Modal Sandboxes	$0.1419/physical core-hr (2 vCPU)	$0.0242/GiB-hr	—	L4: $0.80/hr, A100 40GB: $2.10/hr, A100 80GB: $2.50/hr, H100: $3.95/hr, H200: $4.54/hr	Per second
Vercel Sandbox	$0.128/vCPU-hr	$0.0212/GB-hr	$0.023/GB-month (snapshots)	Do not provide GPU compute	Active CPU only

BYOC support across sandbox runners

The table below shows how each platform handles BYOC deployment, which clouds are supported, and whether it requires a sales process.

Platform	BYOC available	Clouds supported	Access model	Pricing model
Northflank	Yes, fully self-serve	AWS, GCP, Azure, Oracle, CoreWeave, any neoclouds, Civo, bare-metal, on-premises	Self-serve, enterprise contracts available for larger commits (with bulk discounts)	Your existing cloud bill, CPU $0.01389/vCPU-hr and Memory $0.00139/GB-hr*
E2B	Yes, limited and not self-serve	AWS and GCP only	Not publicly disclosed, need to contact sales	Starts at $50/sandbox/month, on top of your existing cloud bill
Modal	No	Managed only	—	—
Fly.io Sprites	No	Managed only	—	—
Vercel Sandbox	No	Managed only (iad1 region only)	—	—
Cloudflare Sandbox	No	Managed only	—	—
CodeSandbox	Enterprise only	Custom dedicated cluster	Enterprise plan, contact sales	Custom

FAQ: Common questions about sandbox runners

The questions below cover what engineering teams most commonly ask when evaluating sandbox runners.

What is the difference between a sandbox runner and a development environment?

A sandbox runner is designed to execute arbitrary or untrusted code safely, typically for AI agents, user-submitted scripts, or code interpretation. A development environment is designed for developers to write, run, and iterate on code they own. Some platforms serve both purposes, but the security requirements and isolation models differ significantly between the two use cases.

Which sandbox runners support self-serve BYOC (Bring Your Own Cloud)?

Northflank supports BYOC self-serve across AWS EKS, GKE, AKS, Oracle Kubernetes, CoreWeave, Civo, bare-metal, and on-premises infrastructure, including OpenShift and RKE2. E2B BYOC is available on Enterprise for AWS and GCP only and requires contacting their team. Modal, Vercel Sandbox, Cloudflare Sandbox, and Fly.io Sprites run on the vendor's infrastructure only.

Which sandbox runners support persistent environments?

Northflank supports both ephemeral and persistent environments with no forced time limits. Fly.io Sprites maintain a persistent ext4 filesystem across sessions with automatic idle behaviour. E2B supports persistent state via pause and resume, with continuous runtime limits per session that reset on pause. CodeSandbox supports persistence via snapshots with VM restore within 2 seconds. Modal supports snapshot-based state preservation. Cloudflare Sandbox state resets after inactivity unless S3-compatible object storage is mounted. Vercel Sandbox sessions run up to 5 hours on Pro and Enterprise with snapshotting available.

Which sandbox runners support GPU workloads?

Northflank supports on-demand GPU workloads (NVIDIA H100, A100, L4, and others) within the same platform as sandbox execution. Modal also provides GPU access configurable per sandbox.

The articles below go deeper on isolation technologies, deployment models, and sandbox infrastructure covered in this guide.

What is an AI sandbox?: A detailed explainer on AI sandbox infrastructure, isolation models, and use cases.
Top AI sandbox platforms for code execution: A full ranked comparison of AI sandbox platforms with pricing, isolation, and session lifecycle breakdowns.
Kata Containers vs Firecracker vs gVisor: A technical comparison of the isolation technologies used across the sandbox runners in this guide.
Top BYOC AI sandboxes: A comparison of sandbox providers that support deployment inside your own cloud infrastructure.
Self-hosted AI sandboxes: Covers the three deployment models for running sandbox infrastructure in your own infrastructure.
Ephemeral sandbox environments: Explains the tradeoffs between ephemeral and persistent sandbox models and when each fits the workload.

Share this article with your network