

Agent Sandbox on Kubernetes: how it works and how to run it in production
Agent sandbox on Kubernetes refers to a specific open-source project under Kubernetes SIG Apps (kubernetes-sigs/agent-sandbox) that provides a declarative, CRD-based API for running isolated, stateful AI agent workloads on Kubernetes.
- Agent sandbox fills a gap that raw Kubernetes primitives do not cover natively: managing long-running, stateful, singleton workloads with stable identity, lifecycle controls (pause, resume, scheduled deletion), and strong isolation for untrusted code execution.
- The project supports gVisor and Kata Containers as isolation backends, both of which provide stronger isolation than standard container namespacing.
Northflank provides production-grade sandbox infrastructure backed by Firecracker, Kata Containers, and gVisor, with both ephemeral and persistent environments, self-serve BYOC across AWS, GCP, Azure, Oracle, CoreWeave, Civo, bare-metal, and on-premises infrastructure, SOC 2 Type 2 compliance, GPU support, and a full workload runtime for APIs, workers, databases, and jobs alongside sandboxes. Northflank has been running this class of workload in production since 2021.
The agent sandbox project on Kubernetes formalises infrastructure patterns that platform engineers running AI workloads have been assembling manually.
This article covers how the project works, what gap it fills over raw Kubernetes primitives, and what the operational reality looks like when running agent sandboxes in production.
Agent sandbox is an open-source Kubernetes controller and set of CRDs developed under Kubernetes SIG Apps, hosted at kubernetes-sigs/agent-sandbox.
It provides a declarative, standardised API for managing workloads that require the characteristics of a long-running, stateful, singleton container with a stable identity, much like a lightweight, single-container VM experience built on Kubernetes primitives.
As AI applications shift from short-lived inference requests to long-running autonomous agents that maintain context, execute code, and interact with tools, mapping those workloads onto existing Kubernetes primitives requires workarounds that the Sandbox CRD is designed to replace.
Kubernetes excels at two workload models: stateless, replicated applications managed by Deployments, and stable, numbered sets of stateful pods managed by StatefulSets. Agent workloads fit neither model cleanly.
An AI agent runtime is typically a singleton: one isolated environment per user session or task, not a replicated pool. It needs persistent storage that survives restarts, a stable hostname and network identity, and lifecycle controls that let it be paused when idle and resumed without losing state. It also executes code that may be untrusted, which requires isolation beyond standard container namespacing.
Before the agent sandbox project, the closest approximation using raw Kubernetes primitives required combining a StatefulSet of size 1, a headless Service, and a PersistentVolumeClaim. This approach lacks specialised lifecycle management like hibernation. Instead of one resource modelling the workload, you are assembling three or more, with no built-in support for pause, resume, warm pools, or scheduled deletion.
Agent sandbox adds a consumption layer on top of Kubernetes primitives designed specifically for agent workload patterns.
See Kubernetes multi-tenancy for more on how Kubernetes handles workload isolation at scale.
Agent sandbox follows the standard Kubernetes controller pattern. You create a Sandbox custom resource, and the controller manages the underlying runtime resources. The core CRD and its extensions are:
- Sandbox: The core resource. It provides a declarative API for managing a single, stateful pod with stable identity and persistent storage, including a stable hostname and network identity, persistent storage that survives restarts, and lifecycle management covering creation, scheduled deletion, pausing, and resuming.
- SandboxTemplate: Defines reusable templates for creating Sandboxes, making it easier to manage large numbers of similar Sandbox configurations without duplicating definitions.
- SandboxClaim: Allows users or higher-level frameworks to request execution environments from a template, abstracting away the provisioning details. LangChain, ADK, and similar frameworks can request a sandbox via SandboxClaim without managing the underlying Sandbox configuration directly.
- SandboxWarmPool: Manages a pool of pre-warmed Sandbox pods that can be quickly allocated, reducing the time it takes to get a new sandbox running. The warm pool pattern is particularly relevant for the Kata Containers remote hypervisor, where cold start latency is higher due to external VM creation. Pre-warming trades idle compute cost for reduced provisioning latency.
A minimal Sandbox looks like this (from the official agent sandbox documentation):
apiVersion: agents.x-k8s.io/v1alpha1
kind: Sandbox
metadata:
name: my-sandbox
spec:
podTemplate:
spec:
containers:
- name: my-container
image: <IMAGE>
Once created, the sandbox is accessible via its stable hostname my-sandbox. The controller handles pod creation, storage binding, and lifecycle management from there.
Agent sandbox supports gVisor and Kata Containers as runtime isolation backends. Both are configured via Kubernetes runtimeClassName, making the project backend-agnostic by design:
- gVisor: Intercepts system calls in user space via its
runscruntime, reducing the kernel attack surface without requiring a full VM per workload. It provides kernel and network isolation suitable for multi-tenant, untrusted code execution. - Kata Containers: Runs each pod inside a lightweight virtual machine, giving each workload a dedicated kernel. This provides stronger isolation at the cost of higher startup latency, which the warm pool pattern is designed to offset.
Standard container namespacing shares the host kernel across all containers on a node. A kernel-level vulnerability in any workload can affect the host and other tenants. Both gVisor and Kata Containers address this by interposing between the workload and the host kernel, limiting the impact of a kernel-level exploit to that workload.
For a detailed technical comparison of the isolation technologies the project supports, see Kata Containers vs Firecracker vs gVisor.
Agent sandbox provides the isolation primitive and the declarative API, but not the surrounding production infrastructure your platform needs.
You install it onto an existing Kubernetes cluster, which means provisioning and managing the underlying cluster infrastructure is outside its scope. The project includes configuration options for API QPS and worker counts for teams running it at scale.
For teams that need sandbox infrastructure without that operational overhead, the options are either a managed sandbox provider or a Bring Your Own Cloud (BYOC) platform that handles orchestration while deploying into your own infrastructure.
See self-hosted AI sandboxes for a breakdown of deployment models and tradeoffs.
Northflank provides production-grade sandbox infrastructure backed by Firecracker, Kata Containers, and gVisor, with orchestration, multi-tenant isolation, autoscaling, and bin-packing handled at the infrastructure level.
Sandboxes run on Northflank's managed cloud or inside your own infrastructure via the Bring Your Own Cloud (BYOC) deployment model, across AWS EKS, GKE, AKS, Oracle Kubernetes, CoreWeave, Civo, bare-metal, and on-premises distributions including OpenShift and RKE2. For teams deploying inside a customer VPC, see customer VPC deployments. BYOC is available self-serve.

Key capabilities include:
- Isolation: Firecracker, Kata Containers, and gVisor applied depending on the workload. End-to-end sandbox creation runs at 1-2 seconds, covering the full stack.
- Ephemeral and persistent environments: Both modes supported with no forced time limits. Persistent volumes, S3-compatible object storage, and stateful databases run alongside sandboxes in the same control plane.
- Full workload runtime: APIs, workers, GPU workloads, and databases run in the same platform as sandboxes, so teams do not need a separate system as requirements grow beyond code execution.
- GPU support: NVIDIA H100, A100, L4, and others on demand.
- Compliance: SOC 2 Type 2 certified, with BYOC deployment for data residency and regulated industries.
Northflank has been running this class of workload in production since 2021 across startups, public companies, government deployments, and regulated industries. cto.new runs thousands of daily code executions on Northflank's sandbox infrastructure and scaled to 30,000+ users without infrastructure changes.
CPU is priced at $0.01667/vCPU-hour and memory at $0.00833/GB-hour. See the full GPU and compute pricing.
For a hands-on walkthrough of spinning up a secure sandbox and microVM on Northflank, see this step-by-step guide.
Get started on Northflank or book a demo with the engineering team to discuss your requirements.
The questions below cover what engineering teams most commonly ask when evaluating agent sandbox on Kubernetes.
Agent sandbox on Kubernetes is an open-source Kubernetes controller and set of CRDs developed under SIG Apps (kubernetes-sigs/agent-sandbox). It provides a declarative API for managing isolated, stateful, singleton workloads on Kubernetes, with built-in support for lifecycle management, stable identity, persistent storage, and isolation runtimes like gVisor and Kata Containers.
Agent sandbox replaces the manual combination of a StatefulSet of size 1, a headless Service, and a PersistentVolumeClaim that engineers currently use to approximate singleton stateful workloads. The Sandbox CRD models this pattern as a single resource with specialised lifecycle controls that the raw primitives do not provide.
The project supports gVisor and Kata Containers as isolation backends, configured via runtimeClassName. Both provide stronger isolation than standard container namespacing by interposing between the workload and the host kernel. The project is designed to be backend-agnostic.
The SandboxWarmPool CRD manages a pool of pre-warmed Sandbox pods. When a new sandbox is requested, the controller claims a pod from the warm pool rather than creating one from scratch, reducing startup latency. This is particularly useful for Kata Containers workloads where VM creation adds cold start overhead.
Agent sandbox is a Kubernetes controller you install and operate on your own cluster. It provides the isolation primitive and API but not the surrounding production infrastructure. A sandbox infrastructure provider handles infrastructure operations, autoscaling, multi-tenancy, and platform-level concerns. Platforms like Northflank run the same isolation technologies (Kata Containers, gVisor, and Firecracker) on either a managed cloud or inside your own infrastructure via the Bring Your Own Cloud (BYOC) deployment model, so you get the isolation model without the operational overhead of running the controller yourself.
Yes. The project is Kubernetes-native and installs via kubectl apply. It runs on any conformant Kubernetes cluster. The GKE documentation covers a specific GKE implementation with managed gVisor and Pod Snapshots, but the core project itself is not GKE-specific.
The articles below go deeper on specific aspects of the infrastructure covered in this guide.
- Kata Containers vs Firecracker vs gVisor: A technical comparison of the isolation technologies the agent sandbox project supports as backends.
- Kubernetes multi-tenancy: Covers how Kubernetes handles workload isolation across tenants and where agent sandbox fits in that model.
- Self-hosted AI sandboxes: Covers the three deployment models for running sandbox infrastructure in your own infrastructure and how to evaluate them.
- How to sandbox AI agents: A practical guide to sandboxing agents, covering architecture patterns and isolation requirements.
- Top BYOC AI sandboxes: A comparison of sandbox providers that support deployment inside your own cloud infrastructure.
- What is an AI sandbox?: A detailed explainer on AI sandbox infrastructure and how isolation models differ.


