← Back to Blog
Header image for blog post: Secure runtime for codegen tools: microVMs, sandboxing, and execution at scale
Will Stewart
Published 10th July 2025

Secure runtime for codegen tools: microVMs, sandboxing, and execution at scale

Code generation tools are reshaping how developers build software. Instead of writing every line by hand, engineers now use systems that generate code automatically, often using large language models (LLMs), to scaffold projects, write functions, and even deploy infrastructure.

But if you’re building a codegen tool, one problem becomes clear fast: you need to execute untrusted code securely.

You can’t risk one user breaking into another’s environment, leaking data, or escaping into your backend systems. You need speed, isolation, and safety. That’s where a secure runtime comes in, specifically, sandboxed microVMs built for ephemeral code execution.

This guide covers:

  • What is codegen?
  • Which codegen tool is best?
  • The infrastructure needed to support safe execution
  • Why secure sandboxing and microVMs matter
  • How to use Northflank to run untrusted workloads at scale

💡 Northflank runs over 2 million microVMs monthly, in production since 2021. We contribute to Kata Containers, Cloud Hypervisor, QEMU, and more.

Our platform supports bring your own cloud and runs securely in your VPC. Companies like Writer and Sentry use Northflank to run untrusted, multi-tenant workloads at scale.

Building secure sandboxing with Firecracker isn’t a weekend project. We’ve already done it, so you don’t have to. Spin up isolated microVMs in seconds and skip the infrastructure burden.

What is codegen?

At its core, codegen (short for code generation) automates the production of source code. Early tools included boilerplate generators and compilers. Today’s codegen tools use LLMs and embeddings to dynamically generate code from prompts, API specs, full repos, or other inputs.

Modern codegen tools can:

  • Translate between languages
  • Scaffold components or full apps
  • Auto-generate tests, CLI commands, and documentation
  • Execute code live and return output in real time

Some run entirely in the browser. Others spin up sandboxed execution environments to compile or run code server-side.

That’s where secure runtimes come in.

Which codegen tool is the best?

The codegen landscape is crowded. Most tools fall into two categories:

  • SaaS tools using proprietary models (e.g. GPT-4, Claude)
  • Open-source agents using open-weight models (e.g. CodeLlama, DeepSeek-Coder)

Execution is the key differentiator. Most proprietary tools bundle it in; open-source agents require you to bring your own sandbox runtime.

Here are the best codegen tools on the market right now.

Tool / AgentCore model(s)Open sourceExecutes code?Execution environmentNotes
GitHub CopilotGPT‑4‑turboNoNoneIDE-only; no runtime
CursorGPT‑4, ClaudeNoAgent + server-side sandboxSecure runtime with sandboxed agents
Cody (Sourcegraph)Claude + embeddingsPartial⚠️ OptionalLocal or cloud backendExecution plug-in optional
ContinueConfigurable OSS LLMs⚠️ OptionalUser‑definedBackend and sandbox left to user
DeepSeek‑CoderDeepSeek‑V3NoneModel-only
Replit GhostwriterProprietaryNoReplit-hosted runtimeIn-IDE execution
LovableClaude, GPT‑4NoBrowser-based sandboxClient-side JS sandbox
EngineLabsClaude, DeepSeekNoServer-side isolated runnersSecure remote execution
VibeKitCodex, Claude Code, GeminiSupports Daytona, Modal, E2BSDK for sandboxed remote execution in secure environments
OpenInterpreterGPTs, ClaudeCLI and browser evalLocal inline eval
Ghostwriter CLIOSS / MixLocal shell backendCLI agent execution
CodeGeeXCodeGeeX2NoneModel-only
CodeLlama 70BMetaNoneFoundation model
StarCoder2BigCodeNoneFoundation model
Phi‑3 MiniMicrosoftNoneLightweight dev model

If you want to support real code execution, you’ll need to build a secure runtime. That means isolating each user in a sandbox environment with resource and network boundaries.

Code execution is a security risk

It only takes one user to break things. If your codegen tool runs generated Python, JavaScript, or shell commands, especially from arbitrary inputs, you’re opening yourself up to:

  • Privilege escalation
  • Container escape
  • Cross-tenant access
  • Denial-of-service

Containers alone don’t cut it. They share the host kernel. A misconfigured capability or kernel exploit can compromise your backend or other users.

To truly isolate untrusted code, you need VM-level separation, but traditional VMs are too slow. You don’t want users waiting 10+ seconds to get a response.

That’s why companies like Northflank use microVMs.

What are microVMs? (and what is Firecracker?)

MicroVMs are lightweight virtual machines designed for fast-start, short-lived workloads. They combine container-like performance with VM-grade security isolation.

What is Firecracker?

Firecracker is a microVM runtime developed by AWS. It powers Lambda and Fargate, offering boot times under 200ms. Other runtimes like Kata Containers build on Firecracker to support OCI-compliant containers in VM-isolated environments.

With Firecracker or Kata, each workload runs:

  • In a sandboxed environment with its own kernel
  • Fully separated network + memory namespace
  • Strict CPU, memory, disk quotas
  • No access to host processes or containers

Perfect for executing untrusted code from a user’s LLM prompt.

How to build a secure codegen tool (without becoming a platform company)

Start with your model. Fine-tuned open-weight LLMs like CodeLlama, StarCoder2, or DeepSeek-Coder-V3 can be served using frameworks like vLLM on GPUs.

(And can be self-hosted by Northflank, which also offers the most cost efficient GPU on-demand pricing).

But once your codegen tool needs to execute code, you’ll hit the secure runtime wall.

Most teams either:

  • Build fragile Firecracker orchestration in-house
  • Try to bolt Kata onto Kubernetes
  • Give up on execution altogether

This is what Northflank solves.

Northflank: Secure runtime for codegen workloads

Northflank lets you spin up microVM-backed containers in seconds. It uses Kata Containers under the hood, giving you Firecracker-grade security without the ops pain.

Here’s what the setup looks like:

Step 1: Multi-tenant isolation

Each project runs in a fully separated namespace. You can scope by user, tenant, team, or use case. Choose your region, bring your own cloud (BYOC), or run multi-region. No noisy neighbor risk.

Step 2: microVM-backed execution

Deploy any container image. Northflank provisions a secure microVM, pulls the image, and runs it with full isolation. Every workload gets its own kernel and vNIC.

Step 3: Optional Docker builds

Use a Dockerfile? Northflank spins up an ephemeral runner, builds your image, and deploys it straight into a microVM-backed service.

You get:

  • Strong runtime isolation
  • Full CI/CD baked in
  • Support for persistent or ephemeral execution
  • Automatic cleanup + monitoring

Why Northflank is the best platform for secure code execution

If you’re building a codegen tool that runs code:

  • You need a secure sandbox
  • You need it to start fast
  • You need to scale it without handholding infra

Northflank gives you:

  • Secure runtime execution using microVMs
  • Firecracker-based isolation with Kata
  • Autoscaling, ephemeral or persistent sandboxes
  • Multi-region, BYOC, GPU support
  • Built-in observability and CI/CD

Whether you’re building the next Copilot or a CLI command generator, securely executing untrusted code should not be an afterthought.

Don’t wait to solve secure execution

Most teams focus on the model, not the infrastructure. But if you run user-submitted code, even briefly, you need a secure runtime environment from day one.

Containers aren’t enough. VMs are too slow. MicroVMs are the middle ground, and Northflank gives you the easiest way to deploy them at scale.

Build a safer, faster, more scalable codegen tool, without building your own sandbox platform.

👉 Try secure microVMs on Northflank

Share this article with your network
X