On Fri, Feb 07, 2020 at 04:46:53PM +0100, Cornelia Huck wrote: > On Fri, 7 Feb 2020 16:30:02 +0100 > Kashyap Chamarthy <kchamart@xxxxxxxxxx> wrote: > [...] > > --- > > .../virt/kvm/running-nested-guests.rst | 171 ++++++++++++++++++ > > 1 file changed, 171 insertions(+) > > create mode 100644 Documentation/virt/kvm/running-nested-guests.rst > > FWIW, there's currently a series converting this subdirectory to rst > on-list. I see, noted. I hope there won't be any conflict, as this is a new file addition. > > > > diff --git a/Documentation/virt/kvm/running-nested-guests.rst b/Documentation/virt/kvm/running-nested-guests.rst > > new file mode 100644 > > index 0000000000000000000000000000000000000000..e94ab665c71a36b7718aebae902af16b792f6dd3 > > --- /dev/null > > +++ b/Documentation/virt/kvm/running-nested-guests.rst > > @@ -0,0 +1,171 @@ > > +Running nested guests with KVM > > +============================== > > I think the common style is to also have a "===..." line on top. Will add. (Just that some projects don't use it; others do. :-)) > > + > > +A nested guest is a KVM guest that in turn runs on a KVM guest:: > > + > > + .----------------. .----------------. > > + | | | | > > + | L2 | | L2 | > > + | (Nested Guest) | | (Nested Guest) | > > + | | | | > > + |----------------'--'----------------| > > + | | > > + | L1 (Guest Hypervisor) | > > + | KVM (/dev/kvm) | > > + | | > > + .------------------------------------------------------. > > + | L0 (Host Hypervisor) | > > + | KVM (/dev/kvm) | > > + |------------------------------------------------------| > > + | x86 Hardware (VMX) | > > Just 'Hardware'? I don't think you want to make this x86-specific? Good point, will make it more generic. > > > + '------------------------------------------------------' > > + > > + > > +Terminology: > > + > > + - L0 – level-0; the bare metal host, running KVM > > + > > + - L1 – level-1 guest; a VM running on L0; also called the "guest > > + hypervisor", as it itself is capable of running KVM. > > + > > + - L2 – level-2 guest; a VM running on L1, this is the "nested guest" > > + > > + > > +Use Cases > > +--------- > > + > > +An additional layer of virtualization sometimes can . You > > Something seems to be missing here? Err, broken sentence while rewriting (perils of distraction). I'll fix it. > > +might have access to a large virtual machine in a cloud environment that > > +you want to compartmentalize into multiple workloads. You might be > > +running a lab environment in a training session. > > + > > +There are several scenarios where nested KVM can be Useful: > > s/Useful/useful/ Will fix in v2. [...] > > + $ cat /sys/module/kvm_intel/parameters/nested > > + Y > > + > > +For AMD hosts, the process is the same as above, except that the module > > +name is ``kvm-amd``. > > This looks x86-specific. Don't know about others, but s390 has one > module, also a 'nested' parameter, which is mutually exclusive with a > 'hpage' parameter. Fair point, I'll add a seperate section for all relevant architectures. Thanks for pointing it out. > > + > > +Once your bare metal host (L0) is configured for nesting, you should be > > +able to start an L1 guest with ``qemu-kvm -cpu host`` (which passes > > +through the host CPU's capabilities as-is to the guest); or for better > > +live migration compatibility, use a named CPU model supported by QEMU, > > +e.g.: ``-cpu Haswell-noTSX-IBRS,vmx=on`` and the guest will subsequently > > +be capable of running an L2 guest with accelerated KVM. > > That's probably more something that should go into a section that gives > an example how to start a nested guest with QEMU? Cpu models also look > different between architectures. Yeah, I wondered about it. I'll add a simple, representative example. [...] > > +Again, to persist the above values across reboot, append them to > > +``/etc/modprobe.d/kvm_intel.conf``:: > > + > > + options kvm-intel nested=y > > + options kvm-intel enable_shadow_vmcs=y > > + options kvm-intel enable_apivc=y > > + options kvm-intel ept=y > > x86 specific -- maybe reorganize this document by starting with a > general setup section and then giving some architecture-specific > information? Yeah, good point. Sorry, I was too x86-centric as I tend to just work with x86 machines. Reorganizing it as you suggest sounds good. [...] > > +Limitations on Linux kernel versions older than 5.3 > > +--------------------------------------------------- > > + > > +On Linux kernel versions older than 5.3, once an L1 guest has started an > > +L2 guest, the L1 guest would no longer capable of being migrated, saved, > > +or loaded (refer to QEMU documentation on "save"/"load") until the L2 > > +guest shuts down. [FIXME: Is this limitation fixed for *all* > > +architectures, including s390x?] > > I don't think we ever had that limitation on s390x, since the whole way > control blocks etc. are handled is different there. David (H), do you > remember? I see, I was just not sure. Thought I better ask on the list :-) Thank you for the quick review! [...] -- /kashyap