Is nested VMX virtualization in the Linux kernel really that stable?
The technical details are a lot more complex than most realize.
Single level VMX virtualization is relatively straightforward even if there are a lot of details to juggle with VMCS setup and handing exits.
Nested virtualization is a whole another animal as one now also has to handle not just the levels but many things the hardware normally does, plus juggling internal state during transitions between levels.
The LKML is filled with discussions and debates where very sharp contributors are trying to make sense of how it would work.
Amazon turning the feature on is one thing. It working 100% perfectly is quite another…
Fair concern, but this has been quietly production-stable on GCP and Azure since 2017 — that's 8+ years at cloud scale. The LKML debates you're referencing are mostly about edge cases in exotic VMX features (nested APIC virtualization, SGX passthrough), not the core nesting path that workloads like Firecracker and Kata actually exercise.
The more interesting signal is that AWS is restricting this to 8th-gen Intel instances only (c8i/m8i/r8i). They're likely leveraging specific microarchitectural improvements in those chips for VMCS shadowing — picking the hardware generation where they can guarantee their reliability bar rather than enabling it broadly and dealing with errata on older silicon. That's actually the careful engineering approach you'd want from a cloud provider.
It's been around for almost 15 years and stable enough for several providers to roll it out in production the past 10 years (GCP and Azure in 2017).
AWS is just late to the game because they've rolled so much of their own stack instead of adapting open source solutions and contributing back to them.
> AWS is just late to the game because they've rolled so much of their own stack instead of adapting open source solutions and contributing back to them.
This is emphatically not true. Contributing to KVM and the kernel (which AWS does anyway) would not have accelerated the availability.
EC2 is not just a data center with commodity equipment. They have customer demands for security and performance that far exceed what one can build with a pile of OSS, to the extent that they build their own compute and networking hardware. They even have CPU and other hardware SKUs not available to the general public.
If my sources are correct, GCP did not launch on dedicated hardware like EC2 did, which raised customer concerns about isolation guarantees. (Not sure if that’s still the case.) And Azure didn’t have hardware-assisted I/O virtualization ("Azure Boost") until just a few years ago and it's not as mature as Nitro.
Even today, Azure doesn’t support nested virtualization the way one might ordinarily expect them to. It's only supported with Hyper-V on the guest, i.e., Windows.
> While nested virtualization is technically possible while using runners, it is not officially supported. Any use of nested VMs is experimental and done at your own risk, we offer no guarantees regarding stability, performance, or compatibility.
The technical details are a lot more complex than most realize.
Single level VMX virtualization is relatively straightforward even if there are a lot of details to juggle with VMCS setup and handing exits.
Nested virtualization is a whole another animal as one now also has to handle not just the levels but many things the hardware normally does, plus juggling internal state during transitions between levels.
The LKML is filled with discussions and debates where very sharp contributors are trying to make sense of how it would work.
Amazon turning the feature on is one thing. It working 100% perfectly is quite another…