

Secure runtime for codegen tools: microVMs, sandboxing, and execution at scale
Code generation tools are reshaping how developers build software. Instead of writing every line by hand, engineers now use systems that generate code automatically, often using large language models (LLMs), to scaffold projects, write functions, and even deploy infrastructure.
But if you’re building a codegen tool, one problem becomes clear fast: you need to execute untrusted code securely.
You can’t risk one user breaking into another’s environment, leaking data, or escaping into your backend systems. You need speed, isolation, and safety. That’s where a secure runtime comes in, specifically, sandboxed microVMs built for ephemeral code execution.
This guide covers:
- What is codegen?
- Which codegen tool is best?
- The infrastructure needed to support safe execution
- Why secure sandboxing and microVMs matter
- How to use Northflank to run untrusted workloads at scale
💡 Northflank runs over 2 million microVMs monthly, in production since 2021. We contribute to Kata Containers, Cloud Hypervisor, QEMU, and more.
Our platform supports bring your own cloud and runs securely in your VPC. Companies like Writer and Sentry use Northflank to run untrusted, multi-tenant workloads at scale.
Building secure sandboxing with Firecracker isn’t a weekend project. We’ve already done it, so you don’t have to. Spin up isolated microVMs in seconds and skip the infrastructure burden.
At its core, codegen (short for code generation) automates the production of source code. Early tools included boilerplate generators and compilers. Today’s codegen tools use LLMs and embeddings to dynamically generate code from prompts, API specs, full repos, or other inputs.
Modern codegen tools can:
- Translate between languages
- Scaffold components or full apps
- Auto-generate tests, CLI commands, and documentation
- Execute code live and return output in real time
Some run entirely in the browser. Others spin up sandboxed execution environments to compile or run code server-side.
That’s where secure runtimes come in.
The codegen landscape is crowded. Most tools fall into two categories:
- SaaS tools using proprietary models (e.g. GPT-4, Claude)
- Open-source agents using open-weight models (e.g. CodeLlama, DeepSeek-Coder)
Execution is the key differentiator. Most proprietary tools bundle it in; open-source agents require you to bring your own sandbox runtime.
Here are the best codegen tools on the market right now.
Tool / Agent | Core model(s) | Open source | Executes code? | Execution environment | Notes |
---|---|---|---|---|---|
GitHub Copilot | GPT‑4‑turbo | No | ❌ | None | IDE-only; no runtime |
Cursor | GPT‑4, Claude | No | ✅ | Agent + server-side sandbox | Secure runtime with sandboxed agents |
Cody (Sourcegraph) | Claude + embeddings | Partial | ⚠️ Optional | Local or cloud backend | Execution plug-in optional |
Continue | Configurable OSS LLMs | ✅ | ⚠️ Optional | User‑defined | Backend and sandbox left to user |
DeepSeek‑Coder | DeepSeek‑V3 | ✅ | ❌ | None | Model-only |
Replit Ghostwriter | Proprietary | No | ✅ | Replit-hosted runtime | In-IDE execution |
Lovable | Claude, GPT‑4 | No | ✅ | Browser-based sandbox | Client-side JS sandbox |
EngineLabs | Claude, DeepSeek | No | ✅ | Server-side isolated runners | Secure remote execution |
VibeKit | Codex, Claude Code, Gemini | ✅ | ✅ | Supports Daytona, Modal, E2B | SDK for sandboxed remote execution in secure environments |
OpenInterpreter | GPTs, Claude | ✅ | ✅ | CLI and browser eval | Local inline eval |
Ghostwriter CLI | OSS / Mix | ✅ | ✅ | Local shell backend | CLI agent execution |
CodeGeeX | CodeGeeX2 | ✅ | ❌ | None | Model-only |
CodeLlama 70B | Meta | ✅ | ❌ | None | Foundation model |
StarCoder2 | BigCode | ✅ | ❌ | None | Foundation model |
Phi‑3 Mini | Microsoft | ✅ | ❌ | None | Lightweight dev model |
If you want to support real code execution, you’ll need to build a secure runtime. That means isolating each user in a sandbox environment with resource and network boundaries.
It only takes one user to break things. If your codegen tool runs generated Python, JavaScript, or shell commands, especially from arbitrary inputs, you’re opening yourself up to:
- Privilege escalation
- Container escape
- Cross-tenant access
- Denial-of-service
Containers alone don’t cut it. They share the host kernel. A misconfigured capability or kernel exploit can compromise your backend or other users.
To truly isolate untrusted code, you need VM-level separation, but traditional VMs are too slow. You don’t want users waiting 10+ seconds to get a response.
That’s why companies like Northflank use microVMs.
MicroVMs are lightweight virtual machines designed for fast-start, short-lived workloads. They combine container-like performance with VM-grade security isolation.
Firecracker is a microVM runtime developed by AWS. It powers Lambda and Fargate, offering boot times under 200ms. Other runtimes like Kata Containers build on Firecracker to support OCI-compliant containers in VM-isolated environments.
With Firecracker or Kata, each workload runs:
- In a sandboxed environment with its own kernel
- Fully separated network + memory namespace
- Strict CPU, memory, disk quotas
- No access to host processes or containers
Perfect for executing untrusted code from a user’s LLM prompt.
Start with your model. Fine-tuned open-weight LLMs like CodeLlama, StarCoder2, or DeepSeek-Coder-V3 can be served using frameworks like vLLM on GPUs.
(And can be self-hosted by Northflank, which also offers the most cost efficient GPU on-demand pricing).
But once your codegen tool needs to execute code, you’ll hit the secure runtime wall.
Most teams either:
- Build fragile Firecracker orchestration in-house
- Try to bolt Kata onto Kubernetes
- Give up on execution altogether
This is what Northflank solves.
Northflank lets you spin up microVM-backed containers in seconds. It uses Kata Containers under the hood, giving you Firecracker-grade security without the ops pain.
Here’s what the setup looks like:
Each project runs in a fully separated namespace. You can scope by user, tenant, team, or use case. Choose your region, bring your own cloud (BYOC), or run multi-region. No noisy neighbor risk.
Deploy any container image. Northflank provisions a secure microVM, pulls the image, and runs it with full isolation. Every workload gets its own kernel and vNIC.
Use a Dockerfile? Northflank spins up an ephemeral runner, builds your image, and deploys it straight into a microVM-backed service.
You get:
- Strong runtime isolation
- Full CI/CD baked in
- Support for persistent or ephemeral execution
- Automatic cleanup + monitoring
If you’re building a codegen tool that runs code:
- You need a secure sandbox
- You need it to start fast
- You need to scale it without handholding infra
Northflank gives you:
- Secure runtime execution using microVMs
- Firecracker-based isolation with Kata
- Autoscaling, ephemeral or persistent sandboxes
- Multi-region, BYOC, GPU support
- Built-in observability and CI/CD
Whether you’re building the next Copilot or a CLI command generator, securely executing untrusted code should not be an afterthought.
Most teams focus on the model, not the infrastructure. But if you run user-submitted code, even briefly, you need a secure runtime environment from day one.
Containers aren’t enough. VMs are too slow. MicroVMs are the middle ground, and Northflank gives you the easiest way to deploy them at scale.
Build a safer, faster, more scalable codegen tool, without building your own sandbox platform.