← Back to blog

Benchmarking Firecracker Boot and Restore

Boot up a VM in 1.1 seconds. We instrumented two full pipelines — cold boot and snapshot restore — from 'nothing exists' to 'SSH works,' and measured where the milliseconds actually go.

Firecracker’s spec says it boots a VM in under 125ms. That number is real, but it only measures one piece — the time from InstanceStart to the guest kernel reaching /sbin/init. It doesn’t include host-side setup, Firecracker API configuration, or waiting for the guest to actually be usable.

We run Firecracker in production at numaVM, where every developer workspace is its own microVM. We instrumented two full pipelines — cold boot and snapshot restore — from “nothing exists” to “SSH works,” and measured where the milliseconds actually go.

The headline numbers: 1,133ms for a cold boot. 176ms for a snapshot restore. This post breaks down both.

We couldn’t find detailed end-to-end restore benchmarks published anywhere. Firecracker’s docs describe the snapshotting mechanism, and Marc Brooker (Firecracker co-author) has mentioned 4–10ms for the restore API call itself — but nobody seems to have published the full orchestration pipeline with every step timed. So we’re sharing ours.

All numbers are from 5-run benchmarks on AWS EC2 ARM (Graviton), NVMe storage with XFS, Firecracker v1.14.2, Alpine Linux guest, 256 MiB RAM, 2 vCPUs. Variance was ±25ms across runs.


Cold Boot: 1,133ms End-to-End

A cold boot is everything: create the filesystem, configure Firecracker, boot the kernel, and wait for the guest to accept connections. Here’s the full sequence with measured durations:

 Phase 1: Host Setup (parallel)                        0–80ms
 ├── Copy rootfs (cp --reflink on XFS)                  23ms
 ├── Create TAP device + flush ARP                      30ms
 ├── Create data volume (dd + mkfs.ext4)                30ms
 └── Spawn Firecracker via systemd-run                  50ms

 Phase 2: Firecracker API Configuration               80–210ms
 ├── PUT /machine-config                                19ms
 ├── PUT /boot-source                                   19ms
 ├── PUT /drives/rootfs                                 18ms
 ├── PUT /drives/data                                   18ms
 ├── PUT /network-interfaces/eth0                       19ms
 └── PUT /vsock                                         18ms

 Phase 3: VM Start                                   210–248ms
 └── PUT /actions (InstanceStart)                       38ms

 Phase 4: Host Networking                            248–281ms
 └── iptables DNAT rules (2 rules)                      33ms

 Phase 5: Guest Boot                                 248–560ms
 ├── Kernel: "Booting Linux on physical CPU"             0ms
 ├── virtio_blk: block devices probed                  301ms
 ├── virtio_net: MAC assigned                          321ms
 ├── EXT4-fs: rootfs mounted                           389ms
 ├── /sbin/init starts (custom init script)            400ms
 ├── Mounts (proc, sys, dev, shm, tmp)                 420ms
 ├── Network configured (ip addr, route, DNS)          440ms
 ├── sshd forked (pre-baked host keys)                 480ms
 └── sshd listening on :22                             560ms

 Phase 6: Readiness Polling                          281–870ms
 └── TCP connect to port 22 succeeds                   870ms

 Total: 1,133ms

Where the Time Goes

PhaseDuration% of Total
Host setup (parallel)80ms7%
Firecracker API config130ms12%
InstanceStart38ms3%
iptables DNAT33ms3%
Guest boot + readiness poll852ms75%
Total1,133ms

Firecracker’s own contribution — loading the kernel and starting vCPUs — is 38ms. Three-quarters of the total time is the Linux kernel booting and the init script getting SSH running inside the guest. The host-side orchestration (everything before InstanceStart plus iptables) is 263ms.

Each VM gets a 4 GB root filesystem cloned from a base image. On ext4, cp writes 4 GB sequentially to NVMe — 1,737ms. On XFS with reflink=1, cp --reflink=auto takes 23ms. It creates a copy-on-write clone sharing the same physical extents. No data moves until the guest writes.

This is the single biggest optimization in the pipeline. It also improved snapshot writes — from 19.5 seconds on ext4 to 1.2 seconds on XFS for a 256 MiB memory dump — because XFS’s extent-based allocator handles Firecracker’s write pattern far better.

The API Configuration Tax

Six sequential HTTP PUTs to Firecracker’s Unix socket API take 130ms combined — 12% of total boot time. Each call averages 19ms. These calls configure machine specs, kernel path, drives, networking, and vsock before the VM can start.

For workloads that boot many VMs, this is worth noting. Firecracker doesn’t support batched configuration — each resource is a separate API call. On our hardware, that’s a fixed 130ms cost before the kernel even loads.

Guest Boot: 560ms from Kernel to sshd

The kernel boots in ~400ms. Our init script (a shell script running as PID 1) takes another 160ms to reach sshd listening:

Offset (from kernel start)Step
0msKernel starts on vCPU 0
301msvirtio_blk probes block devices
389msEXT4 rootfs mounted
400msKernel execs /sbin/init
420msFilesystem mounts complete
440msNetwork configured via ip commands
480mssshd forked with pre-baked host keys
560mssshd listening (confirmed via /proc/net/tcp)

Kernel boot is CPU-bound. The rootfs is already in host page cache from the reflink copy. Adding more vCPUs doesn’t help — we benchmarked 2 vs 4 vCPUs with no measurable difference. Linux kernel initialization is single-threaded until late in init.


Snapshot Restore: 176ms

This is the number we’re most excited about, and the one we couldn’t find benchmarked elsewhere.

Firecracker can snapshot a running VM — CPU registers, memory contents, device state — and restore it later. We snapshot every VM on idle and restore on demand. Here’s the full restore pipeline:

 Phase 1: Host Setup                                    0–97ms
 ├── Create TAP device + flush ARP                      38ms
 └── Spawn Firecracker via systemd-run                  59ms

 Phase 2: Snapshot Restore                            97–140ms
 ├── PUT /snapshot/load (memory-mapped)                 25ms
 └── PATCH /vm (resume vCPUs)                           18ms

 Phase 3: Host Networking                            140–173ms
 └── iptables DNAT rules                                33ms

 Phase 4: Readiness Check                            173–176ms
 └── TCP connect to port 22 succeeds                     2ms

 Total: 176ms
PhaseDuration% of Total
Host setup (TAP + Firecracker spawn)97ms55%
Snapshot load + resume43ms24%
iptables DNAT33ms19%
Readiness check2ms1%
Total176ms

Compare this to cold boot’s 1,133ms. Restore is 6.4x faster — and the reason is that it skips everything expensive: no rootfs copy, no API configuration, no kernel boot, no init script, no waiting for sshd to start.

Why It’s Fast: mmap, Not Read

Firecracker doesn’t read the snapshot memory file into RAM. It creates a MAP_PRIVATE mapping — the kernel demand-pages memory as the guest accesses it. This means the /snapshot/load call returns in 25ms regardless of VM memory size. A 256 MiB VM and a 4 GiB VM restore in the same time.

The tradeoff: the memory file must remain on disk for the lifetime of the restored VM. First access to each page incurs a page fault. In practice, the working set loads within the first few hundred milliseconds of the guest running, and the latency is invisible to the user.

Why Readiness Is 2ms Instead of 600ms+

sshd was already listening on port 22 when the snapshot was taken. The TCP stack, the bound socket, the process state — all restored from the snapshot. The first TCP connect attempt gets a SYN-ACK in 2ms. No polling loop, no timeout cycles.

This is the single biggest difference between cold boot and restore. Cold boot spends 75% of its time waiting for the Linux kernel to initialize and sshd to start listening. Restore skips all of it — the guest resumes mid-execution, exactly where it was frozen.

End-to-End: Sub-500ms

The 176ms is measured internally — host-side instrumentation around each step. Add API overhead, network latency, and the client’s own connection setup, and the user-facing number lands under 500ms. From the user’s perspective: click, and you’re back where you left off. Processes running, files on disk, terminal history intact.

For context, Firecracker’s own docs and Marc Brooker’s writing mention snapshot restore at the API level (4–10ms for the call itself). Our 25ms for /snapshot/load is in that range. But the full pipeline — spawn a new Firecracker process, load the snapshot, restore networking, confirm reachability — is what actually determines user-facing latency, and we haven’t seen that benchmarked end-to-end elsewhere.


Readiness Detection: TCP Polling vs. Vsock

The host needs to know when the guest is ready to accept connections. We tried two approaches.

TCP Connect Polling (what we use)

const sock = net.createConnection({ host: vmIp, port: 22, timeout: 50 });
sock.once("connect", () => { sock.destroy(); resolve(true); });
sock.once("error",   () => { sock.destroy(); resolve(false); });
sock.once("timeout", () => { sock.destroy(); resolve(false); });

50ms connect timeout, 5ms sleep between attempts. During boot, every attempt times out (the guest hasn’t configured networking yet). Once sshd binds port 22, the SYN-ACK comes back in <1ms.

The timeout value matters. TCP packets traverse: host kernel → bridge → TAP → Firecracker → guest virtio-net → guest kernel → back. Under load, the round trip takes 20–50ms. A 15ms timeout causes missed SYN-ACKs and extra polling cycles. 50ms is the sweet spot — long enough to catch responses, short enough to not block on pre-boot attempts.

Vsock Signaling (what we tried)

Firecracker supports vsock — a direct host-guest communication channel. The guest could signal readiness immediately, no polling. On a running VM, vsock round-trip is 3ms. During boot, it’s 341ms.

The reason: Firecracker’s VMM thread — the one handling device emulation — runs a single-threaded epoll event loop. During boot, the kernel reads the rootfs, generating hundreds of virtio-blk I/O completions. The vsock event queues behind all of them:

Guest: vsock connect() → virtio kick
  → epoll queue: [blk, blk, blk, ..., vsock]
  → Firecracker VMM thread processes events sequentially
  → 341ms before vsock event is handled

TCP polling doesn’t have this problem. TCP goes through the host kernel’s network stack (bridge → TAP), which runs on a separate CPU independently of Firecracker’s event loop.

Note: “single-threaded” here refers specifically to the VMM/device-emulation thread. Firecracker runs multiple threads — an API thread, the VMM thread, and one thread per vCPU — but device I/O events are serialized through one epoll loop.


Gotchas

ARP Cache Poisoning on IP Reuse

When a VM is destroyed and a new one gets the same IP, the host’s ARP table retains the old VM’s MAC address. TCP SYNs route to the wrong MAC for ~4 seconds until the ARP entry expires. We saw boot times jump to 6,500ms on the second run of every benchmark.

Fix: ip neigh flush dev br0 to ${vmIp} when creating the TAP device. One command. All runs became consistent.

set -e in PID 1

Our init script runs as PID 1 with set -e. A binary that failed to connect to a host service returned exit code 1. set -e killed the script. PID 1 died. Kernel panic:

[    0.720869] Kernel panic - not syncing: Attempted to kill init!
               exitcode=0x00000100

Every command in a PID 1 init script that might fail needs || true or explicit error handling. A stray non-zero exit doesn’t crash your app — it crashes the kernel.


What Doesn’t Help

More vCPUs. 4 vCPUs vs 2 — no measurable difference in boot time. Kernel init is single-threaded.

Faster storage. The rootfs is in host page cache after the reflink copy. Boot is CPU-bound. NVMe vs ramdisk wouldn’t change the numbers.

Batched API configuration. Firecracker requires one PUT per resource. There’s no way to send all six config calls at once. The 130ms API tax is fixed.

Golden snapshots (boot once, snapshot, clone for every new VM). Would cut cold boot to ~200ms. But the kernel command line — which carries network config, SSH keys, environment variables — is frozen in the snapshot. Reconfiguring after restore requires a vsock agent, per-memory-size snapshot variants, and rootfs consistency guarantees. The complexity exceeds the benefit for our use case.


Alpine vs. Ubuntu

OperationAlpine 256 MiBUbuntu 256 MiB
Cold boot (internal)1,133ms1,450ms
Cold boot (API poll)1,450ms1,770ms
SSH ready (API poll)1,855ms2,200ms
Snapshot restore (internal)176ms176ms
Snapshot restore (API poll)350ms325ms

Ubuntu’s cold boot is ~300ms slower — larger rootfs means more page cache misses on the first boot. On subsequent boots with a warm cache, the gap narrows. Snapshot restore is identical because the memory-mapped restore path is independent of rootfs size.

The “API poll” numbers include a 250ms polling interval from our external benchmark harness — not real added latency.


Summary

MetricValue
Cold boot (nothing → SSH ready)1,133ms
Firecracker InstanceStart38ms
Host-side orchestration263ms
Guest kernel + init560ms
Snapshot restore (nothing → SSH ready)176ms
Snapshot load (PUT /snapshot/load)25ms

The headline number — 1.1 seconds for a cold boot — breaks down to 263ms of host setup, 38ms of Firecracker doing its job, and 560ms of Linux booting. Snapshot restore sidesteps the guest boot entirely: 176ms total, with the snapshot itself loading in 25ms via mmap.

If you’re building on Firecracker and want to compare numbers, the test configuration is: Firecracker v1.14.2, AWS EC2 ARM (Graviton), NVMe/XFS with reflink, Alpine Linux 3.21 minimal rootfs (~1 GB), 256 MiB RAM, 2 vCPUs, custom init script (no systemd/OpenRC). We’re happy to share methodology details — reach out at hello@numavm.com.


We built numaVM on Firecracker to give developers and AI agents their own Linux machines with persistent storage and sub-500ms snapshot restore. numavm.com