* Kashyap Chamarthy (kchamart@xxxxxxxxxx) wrote: > This is a rewrite of the Wiki page: > > https://www.linux-kvm.org/page/Nested_Guests > > Signed-off-by: Kashyap Chamarthy <kchamart@xxxxxxxxxx> > --- > Question: is the live migration of L1-with-L2-running-in-it fixed for > *all* architectures, including s390x? > --- > .../virt/kvm/running-nested-guests.rst | 171 ++++++++++++++++++ > 1 file changed, 171 insertions(+) > create mode 100644 Documentation/virt/kvm/running-nested-guests.rst > > diff --git a/Documentation/virt/kvm/running-nested-guests.rst b/Documentation/virt/kvm/running-nested-guests.rst > new file mode 100644 > index 0000000000000000000000000000000000000000..e94ab665c71a36b7718aebae902af16b792f6dd3 > --- /dev/null > +++ b/Documentation/virt/kvm/running-nested-guests.rst > @@ -0,0 +1,171 @@ > +Running nested guests with KVM > +============================== > + > +A nested guest is a KVM guest that in turn runs on a KVM guest:: Note nesting maybe a little more general; e.g. L1 might be another OS/hypervisor that wants to run it's own L2; and similarly KVM might be the L1 under someone elses hypervisor. I think this doc is mostly about the case of KVM being the L0 and wanting to run an L1 that's capable of running an L2. > + .----------------. .----------------. > + | | | | > + | L2 | | L2 | > + | (Nested Guest) | | (Nested Guest) | > + | | | | > + |----------------'--'----------------| > + | | > + | L1 (Guest Hypervisor) | > + | KVM (/dev/kvm) | > + | | > + .------------------------------------------------------. > + | L0 (Host Hypervisor) | > + | KVM (/dev/kvm) | > + |------------------------------------------------------| > + | x86 Hardware (VMX) | > + '------------------------------------------------------' This is now x86 specific but the doc is in a general directory; I'm not sure what other architecture nesting rules are. Woth having VMX/SVM at least. > + > +Terminology: > + > + - L0 – level-0; the bare metal host, running KVM > + > + - L1 – level-1 guest; a VM running on L0; also called the "guest > + hypervisor", as it itself is capable of running KVM. > + > + - L2 – level-2 guest; a VM running on L1, this is the "nested guest" > + > + > +Use Cases > +--------- > + > +An additional layer of virtualization sometimes can . You > +might have access to a large virtual machine in a cloud environment that > +you want to compartmentalize into multiple workloads. You might be > +running a lab environment in a training session. Lose this paragraph, and just use the list below? > +There are several scenarios where nested KVM can be Useful: > + > + - As a developer, you want to test your software on different OSes. > + Instead of renting multiple VMs from a Cloud Provider, using nested > + KVM lets you rent a large enough "guest hypervisor" (level-1 guest). > + This in turn allows you to create multiple nested guests (level-2 > + guests), running different OSes, on which you can develop and test > + your software. > + > + - Live migration of "guest hypervisors" and their nested guests, for > + load balancing, disaster recovery, etc. > + > + - Using VMs for isolation (as in Kata Containers, and before it Clear > + Containers https://lwn.net/Articles/644675/) if you're running on a > + cloud provider that is already using virtual machines Some others that might be worth listing; - VM image creation tools (e.g. virt-install etc) often run their own VM, and users expect these to work inside a VM. - Some other OS's use virtualization internally for other features/protection. > +Procedure to enable nesting on the bare metal host > +-------------------------------------------------- > + > +The KVM kernel modules do not enable nesting by default (though your > +distribution may override this default). It's the other way; see 1e58e5e for intel has made it default; AMD has it set as default for longer. >To enable nesting, set the > +``nested`` module parameter to ``Y`` or ``1``. You may set this > +parameter persistently in a file in ``/etc/modprobe.d`` in the L0 host: > +1. On the bare metal host (L0), list the kernel modules, and ensure that > + the KVM modules:: > + > + $ lsmod | grep -i kvm > + kvm_intel 133627 0 > + kvm 435079 1 kvm_intel > + > +2. Show information for ``kvm_intel`` module:: > + > + $ modinfo kvm_intel | grep -i nested > + parm: nested:boolkvm 435079 1 kvm_intel > + > +3. To make nested KVM configuration persistent across reboots, place the > + below entry in a config attribute:: > + > + $ cat /etc/modprobe.d/kvm_intel.conf > + options kvm-intel nested=y > + > +4. Unload and re-load the KVM Intel module:: > + > + $ sudo rmmod kvm-intel > + $ sudo modprobe kvm-intel > + > +5. Verify if the ``nested`` parameter for KVM is enabled:: > + > + $ cat /sys/module/kvm_intel/parameters/nested > + Y > + > +For AMD hosts, the process is the same as above, except that the module > +name is ``kvm-amd``. > + > +Once your bare metal host (L0) is configured for nesting, you should be > +able to start an L1 guest with ``qemu-kvm -cpu host`` (which passes > +through the host CPU's capabilities as-is to the guest); or for better > +live migration compatibility, use a named CPU model supported by QEMU, > +e.g.: ``-cpu Haswell-noTSX-IBRS,vmx=on`` and the guest will subsequently > +be capable of running an L2 guest with accelerated KVM. > + > +Additional nested-related kernel parameters > +------------------------------------------- > + > +If your hardware is sufficiently advanced (Intel Haswell processor or > +above which has newer hardware virt extensions), you might want to > +enable additional features: "Shadow VMCS (Virtual Machine Control > +Structure)", APIC Virtualization on your bare metal host (L0). > +Parameters for Intel hosts:: > + > + $ cat /sys/module/kvm_intel/parameters/enable_shadow_vmcs > + Y > + > + $ cat /sys/module/kvm_intel/parameters/enable_apicv > + N > + > + $ cat /sys/module/kvm_intel/parameters/ept > + Y Don't those happen automatically (mostly?) > +Again, to persist the above values across reboot, append them to > +``/etc/modprobe.d/kvm_intel.conf``:: > + > + options kvm-intel nested=y > + options kvm-intel enable_shadow_vmcs=y > + options kvm-intel enable_apivc=y > + options kvm-intel ept=y > + > + > +Live migration with nested KVM > +------------------------------ > + > +The below live migration scenarios should work as of Linux kernel 5.3 > +and QEMU 4.2.0. In all the below cases, L1 exposes ``/dev/kvm`` in > +it, i.e. the L2 guest is a "KVM-accelerated guest", not a "plain > +emulated guest" (as done by QEMU's TCG). > + > +- Migrating a nested guest (L2) to another L1 guest on the *same* bare > + metal host. > + > +- Migrating a nested guest (L2) to another L1 guest on a *different* > + bare metal host. > + > +- Migrating an L1 guest, with an *offline* nested guest in it, to > + another bare metal host. > + > +- Migrating an L1 guest, with a *live* nested guest in it, to another > + bare metal host. > + > + > +Limitations on Linux kernel versions older than 5.3 > +--------------------------------------------------- > + > +On Linux kernel versions older than 5.3, once an L1 guest has started an > +L2 guest, the L1 guest would no longer capable of being migrated, saved, > +or loaded (refer to QEMU documentation on "save"/"load") until the L2 > +guest shuts down. [FIXME: Is this limitation fixed for *all* > +architectures, including s390x?] > + > +Attempting to migrate or save & load an L1 guest while an L2 guest is > +running will result in undefined behavior. You might see a ``kernel > +BUG!`` entry in ``dmesg``, a kernel 'oops', or an outright kernel panic. > +Such a migrated or loaded L1 guest can no longer be considered stable or > +secure, and must be restarted. > + > +Migrating an L1 guest merely configured to support nesting, while not > +actually running L2 guests, is expected to function normally. > +Live-migrating an L2 guest from one L1 guest to another is also expected > +to succeed. Can you add an entry along the lines of 'reporting bugs with nesting' that explains you should clearly state what the host CPU is, and the exact OS and hypervisor config in L0,L1 and L2 ? Dave > -- > 2.21.0 > -- Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK