Header image for blog post: Secure runtime for codegen tools: microVMs, sandboxing, and execution at scale

Published 10th July 2025

Secure runtime for codegen tools: microVMs, sandboxing, and execution at scale

Code generation tools are reshaping how developers build software. Instead of writing every line by hand, engineers now use systems that generate code automatically, often using large language models (LLMs), to scaffold projects, write functions, and even deploy infrastructure.

But if you’re building a codegen tool, one problem becomes clear fast: you need to execute untrusted code securely.

You can’t risk one user breaking into another’s environment, leaking data, or escaping into your backend systems. You need speed, isolation, and safety. That’s where a secure runtime comes in, specifically, sandboxed microVMs built for ephemeral code execution.

This guide covers:

What is codegen?
Which codegen tool is best?
The infrastructure needed to support safe execution
Why secure sandboxing and microVMs matter
How to use Northflank to run untrusted workloads at scale

💡 Northflank runs over 2 million microVMs monthly, in production since 2021. We contribute to Kata Containers, Cloud Hypervisor, QEMU, and more.

Our platform supports bring your own cloud and runs securely in your VPC. Companies like Writer and Sentry use Northflank to run untrusted, multi-tenant workloads at scale.

Building secure sandboxing with Firecracker isn’t a weekend project. We’ve already done it, so you don’t have to. Spin up isolated microVMs in seconds and skip the infrastructure burden.

What is codegen?

At its core, codegen (short for code generation) automates the production of source code. Early tools included boilerplate generators and compilers. Today’s codegen tools use LLMs and embeddings to dynamically generate code from prompts, API specs, full repos, or other inputs.

Modern codegen tools can:

Translate between languages
Scaffold components or full apps
Auto-generate tests, CLI commands, and documentation
Execute code live and return output in real time

Some run entirely in the browser. Others spin up sandboxed execution environments to compile or run code server-side.

That’s where secure runtimes come in.

Which codegen tool is the best?

The codegen landscape is crowded. Most tools fall into two categories:

SaaS tools using proprietary models (e.g. GPT-4, Claude)
Open-source agents using open-weight models (e.g. CodeLlama, DeepSeek-Coder)

Execution is the key differentiator. Most proprietary tools bundle it in; open-source agents require you to bring your own sandbox runtime.

Here are the best codegen tools on the market right now.

Tool / Agent	Core model(s)	Open source	Executes code?	Execution environment	Notes
GitHub Copilot	GPT‑4‑turbo	No	❌	None	IDE-only; no runtime
Cursor	GPT‑4, Claude	No	✅	Agent + server-side sandbox	Secure runtime with sandboxed agents
Cody (Sourcegraph)	Claude + embeddings	Partial	⚠️ Optional	Local or cloud backend	Execution plug-in optional
Continue	Configurable OSS LLMs	✅	⚠️ Optional	User‑defined	Backend and sandbox left to user
DeepSeek‑Coder	DeepSeek‑V3	✅	❌	None	Model-only
Replit Ghostwriter	Proprietary	No	✅	Replit-hosted runtime	In-IDE execution
Lovable	Claude, GPT‑4	No	✅	Browser-based sandbox	Client-side JS sandbox
EngineLabs	Claude, DeepSeek	No	✅	Server-side isolated runners	Secure remote execution
VibeKit	Codex, Claude Code, Gemini	✅	✅	Supports Daytona, Modal, E2B	SDK for sandboxed remote execution in secure environments
OpenInterpreter	GPTs, Claude	✅	✅	CLI and browser eval	Local inline eval
Ghostwriter CLI	OSS / Mix	✅	✅	Local shell backend	CLI agent execution
CodeGeeX	CodeGeeX2	✅	❌	None	Model-only
CodeLlama 70B	Meta	✅	❌	None	Foundation model
StarCoder2	BigCode	✅	❌	None	Foundation model
Phi‑3 Mini	Microsoft	✅	❌	None	Lightweight dev model

If you want to support real code execution, you’ll need to build a secure runtime. That means isolating each user in a sandbox environment with resource and network boundaries.

Code execution is a security risk

It only takes one user to break things. If your codegen tool runs generated Python, JavaScript, or shell commands, especially from arbitrary inputs, you’re opening yourself up to:

Privilege escalation
Container escape
Cross-tenant access
Denial-of-service

Containers alone don’t cut it. They share the host kernel. A misconfigured capability or kernel exploit can compromise your backend or other users.

To truly isolate untrusted code, you need VM-level separation, but traditional VMs are too slow. You don’t want users waiting 10+ seconds to get a response.

That’s why companies like Northflank use microVMs.

What are microVMs? (and what is Firecracker?)

MicroVMs are lightweight virtual machines designed for fast-start, short-lived workloads. They combine container-like performance with VM-grade security isolation.

What is Firecracker?

Firecracker is a microVM runtime developed by AWS. It powers Lambda and Fargate, offering boot times under 200ms. Other runtimes like Kata Containers build on Firecracker to support OCI-compliant containers in VM-isolated environments.

With Firecracker or Kata, each workload runs:

In a sandboxed environment with its own kernel
Fully separated network + memory namespace
Strict CPU, memory, disk quotas
No access to host processes or containers

Perfect for executing untrusted code from a user’s LLM prompt.

How to build a secure codegen tool (without becoming a platform company)

Start with your model. Fine-tuned open-weight LLMs like CodeLlama, StarCoder2, or DeepSeek-Coder-V3 can be served using frameworks like vLLM on GPUs.

(And can be self-hosted by Northflank, which also offers the most cost efficient GPU on-demand pricing).

But once your codegen tool needs to execute code, you’ll hit the secure runtime wall.

Most teams either:

Build fragile Firecracker orchestration in-house
Try to bolt Kata onto Kubernetes
Give up on execution altogether

This is what Northflank solves.

Northflank: Secure runtime for codegen workloads

Northflank lets you spin up microVM-backed containers in seconds. It uses Kata Containers under the hood, giving you Firecracker-grade security without the ops pain.

Here’s what the setup looks like:

Step 1: Multi-tenant isolation

Each project runs in a fully separated namespace. You can scope by user, tenant, team, or use case. Choose your region, bring your own cloud (BYOC), or run multi-region. No noisy neighbor risk.

Step 2: microVM-backed execution

Deploy any container image. Northflank provisions a secure microVM, pulls the image, and runs it with full isolation. Every workload gets its own kernel and vNIC.

Step 3: Optional Docker builds

Use a Dockerfile? Northflank spins up an ephemeral runner, builds your image, and deploys it straight into a microVM-backed service.

You get:

Strong runtime isolation
Full CI/CD baked in
Support for persistent or ephemeral execution
Automatic cleanup + monitoring

Why Northflank is the best platform for secure code execution

If you’re building a codegen tool that runs code:

You need a secure sandbox
You need it to start fast
You need to scale it without handholding infra

Northflank gives you:

Secure runtime execution using microVMs
Firecracker-based isolation with Kata
Autoscaling, ephemeral or persistent sandboxes
Multi-region, BYOC, GPU support
Built-in observability and CI/CD

Whether you’re building the next Copilot or a CLI command generator, securely executing untrusted code should not be an afterthought.

Don’t wait to solve secure execution

Most teams focus on the model, not the infrastructure. But if you run user-submitted code, even briefly, you need a secure runtime environment from day one.

Containers aren’t enough. VMs are too slow. MicroVMs are the middle ground, and Northflank gives you the easiest way to deploy them at scale.

Build a safer, faster, more scalable codegen tool, without building your own sandbox platform.

👉 Try secure microVMs on Northflank

Share this article with your network

Daniel Adeboye • 31st July 2025

B100 vs H100: Best GPU for LLMs, vision models, and scalable training

Compare NVIDIA B100 vs H100 GPUs for AI training and inference. Explore performance, architecture, and pricing and see how to access top-tier GPUs like H100 and B200 with Northflank.

Will Stewart • 30th July 2025

Top Modal Sandboxes alternatives for secure AI code execution

If you're building AI agents, code interpreters, or platforms that execute untrusted code, Modal Sandboxes might be on your radar.

Also from the blog