Re: [PATCH v2] docs/virt/kvm: Document running nested guests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 20 Apr 2020 13:17:55 +0200
Kashyap Chamarthy <kchamart@xxxxxxxxxx> wrote:

> This is a rewrite of this[1] Wiki page with further enhancements.  The
> doc also includes a section on debugging problems in nested
> environments.
> 
> [1] https://www.linux-kvm.org/page/Nested_Guests
> 
> Signed-off-by: Kashyap Chamarthy <kchamart@xxxxxxxxxx>
> ---
> v1 is here: https://marc.info/?l=kvm&m=158108941605311&w=2
> 
> In v2:
>   - Address Cornelia's feedback v1:
>     https://marc.info/?l=kvm&m=158109042605606&w=2
>   - Address Dave's feedback from v1:
>     https://marc.info/?l=kvm&m=158109134905930&w=2
> ---
>  .../virt/kvm/running-nested-guests.rst        | 275 ++++++++++++++++++
>  1 file changed, 275 insertions(+)
>  create mode 100644 Documentation/virt/kvm/running-nested-guests.rst
> 
> diff --git a/Documentation/virt/kvm/running-nested-guests.rst b/Documentation/virt/kvm/running-nested-guests.rst
> new file mode 100644
> index 0000000000000000000000000000000000000000..c6c9ccfa0c00e3cbfd65782ceae962b7ef52b34b
> --- /dev/null
> +++ b/Documentation/virt/kvm/running-nested-guests.rst
> @@ -0,0 +1,275 @@
> +==============================
> +Running nested guests with KVM
> +==============================
> +
> +A nested guest is the ability to run a guest inside another guest (it
> +can be KVM-based or a different hypervisor).  The straightforward
> +example is a KVM guest that in turn runs on KVM a guest (the rest of

s/on KVM a guest/on a KVM guest/

> +this document is built on this example)::
> +
> +              .----------------.  .----------------.
> +              |                |  |                |
> +              |      L2        |  |      L2        |
> +              | (Nested Guest) |  | (Nested Guest) |
> +              |                |  |                |
> +              |----------------'--'----------------|
> +              |                                    |
> +              |       L1 (Guest Hypervisor)        |
> +              |          KVM (/dev/kvm)            |
> +              |                                    |
> +      .------------------------------------------------------.
> +      |                 L0 (Host Hypervisor)                 |
> +      |                    KVM (/dev/kvm)                    |
> +      |------------------------------------------------------|
> +      |        Hardware (with virtualization extensions)     |
> +      '------------------------------------------------------'
> +
> +Terminology:
> +
> +- L0 – level-0; the bare metal host, running KVM
> +
> +- L1 – level-1 guest; a VM running on L0; also called the "guest
> +  hypervisor", as it itself is capable of running KVM.
> +
> +- L2 – level-2 guest; a VM running on L1, this is the "nested guest"
> +
> +.. note:: The above diagram is modelled after x86 architecture; s390x,

s/x86 architecture/the x86 architecture/

> +          ppc64 and other architectures are likely to have different

s/to have/to have a/

> +          design for nesting.
> +
> +          For example, s390x has an additional layer, called "LPAR
> +          hypervisor" (Logical PARtition) on the baremetal, resulting in
> +          "four levels" in a nested setup — L0 (bare metal, running the
> +          LPAR hypervisor), L1 (host hypervisor), L2 (guest hypervisor),
> +          L3 (nested guest).

What about:

"For example, s390x always has an LPAR (LogicalPARtition) hypervisor
running on bare metal, adding another layer and resulting in at least
four levels in a nested setup..."

> +
> +          This document will stick with the three-level terminology (L0,
> +          L1, and L2) for all architectures; and will largely focus on
> +          x86.
> +
> +

(...)

> +Enabling "nested" (s390x)
> +-------------------------
> +
> +1. On the host hypervisor (L0), enable the ``nested`` parameter on
> +   s390x::
> +
> +    $ rmmod kvm
> +    $ modprobe kvm nested=1
> +
> +.. note:: On s390x, the kernel parameter ``hpage`` parameter is mutually

Drop one of the "parameter"?

> +          exclusive with the ``nested`` paramter; i.e. to have
> +          ``nested`` enabled you _must_ disable the ``hpage`` parameter.

"i.e., in order to be able to enable ``nested``, the ``hpage``
parameter _must_ be disabled."

?

> +
> +2. The guest hypervisor (L1) must be allowed to have ``sie`` CPU

"must be provided with" ?

> +   feature — with QEMU, this is possible by using "host passthrough"

s/this is possible by/this can be done by e.g./ ?

> +   (via the command-line ``-cpu host``).
> +
> +3. Now the KVM module can be enabled in the L1 (guest hypervisor)::

s/enabled/loaded/

> +
> +    $ modprobe kvm
> +
> +
> +Live migration with nested KVM
> +------------------------------
> +
> +The below live migration scenarios should work as of Linux kernel 5.3
> +and QEMU 4.2.0.  In all the below cases, L1 exposes ``/dev/kvm`` in
> +it, i.e. the L2 guest is a "KVM-accelerated guest", not a "plain
> +emulated guest" (as done by QEMU's TCG).

The 5.3/4.2 versions likely apply to x86? Should work for s390x as well
as of these version, but should have worked earlier already :)

> +
> +- Migrating a nested guest (L2) to another L1 guest on the *same* bare
> +  metal host.
> +
> +- Migrating a nested guest (L2) to another L1 guest on a *different*
> +  bare metal host.
> +
> +- Migrating an L1 guest, with an *offline* nested guest in it, to
> +  another bare metal host.
> +
> +- Migrating an L1 guest, with a  *live* nested guest in it, to another
> +  bare metal host.
> +
> +Limitations on Linux kernel versions older than 5.3
> +---------------------------------------------------
> +
> +On x86 systems-only (as this does *not* apply for s390x):

Add a "x86" marker? Or better yet, group all the x86 stuff in an x86
section?

> +
> +On Linux kernel versions older than 5.3, once an L1 guest has started an
> +L2 guest, the L1 guest would no longer capable of being migrated, saved,
> +or loaded (refer to QEMU documentation on "save"/"load") until the L2
> +guest shuts down.
> +
> +Attempting to migrate or save-and-load an L1 guest while an L2 guest is
> +running will result in undefined behavior.  You might see a ``kernel
> +BUG!`` entry in ``dmesg``, a kernel 'oops', or an outright kernel panic.
> +Such a migrated or loaded L1 guest can no longer be considered stable or
> +secure, and must be restarted.
> +
> +Migrating an L1 guest merely configured to support nesting, while not
> +actually running L2 guests, is expected to function normally.
> +Live-migrating an L2 guest from one L1 guest to another is also expected
> +to succeed.
> +
> +Reporting bugs from "nested" setups
> +-----------------------------------
> +
> +(This is written with x86 terminology in mind, but similar should apply
> +for other architectures.)

Better to reorder it a bit (see below).

> +
> +Debugging "nested" problems can involve sifting through log files across
> +L0, L1 and L2; this can result in tedious back-n-forth between the bug
> +reporter and the bug fixer.
> +
> +- Mention that you are in a "nested" setup.  If you are running any kind
> +  of "nesting" at all, say so.  Unfortunately, this needs to be called
> +  out because when reporting bugs, people tend to forget to even
> +  *mention* that they're using nested virtualization.
> +
> +- Ensure you are actually running KVM on KVM.  Sometimes people do not
> +  have KVM enabled for their guest hypervisor (L1), which results in
> +  them running with pure emulation or what QEMU calls it as "TCG", but
> +  they think they're running nested KVM.  Thus confusing "nested Virt"
> +  (which could also mean, QEMU on KVM) with "nested KVM" (KVM on KVM).
> +
> +- What information to collect?  The following; it's not an exhaustive
> +  list, but a very good starting point:
> +
> +  - Kernel, libvirt, and QEMU version from L0
> +
> +  - Kernel, libvirt and QEMU version from L1
> +
> +  - QEMU command-line of L1 -- preferably full log from
> +    ``/var/log/libvirt/qemu/instance.log``

(if you are running libvirt)

> +
> +  - QEMU command-line of L2 -- preferably full log from
> +    ``/var/log/libvirt/qemu/instance.log``

(if you are running libvirt)

> +
> +  - Full ``dmesg`` output from L0
> +
> +  - Full ``dmesg`` output from L1
> +
> +  - Output of: ``x86info -a`` (& ``lscpu``) from L0
> +
> +  - Output of: ``x86info -a`` (& ``lscpu``) from L1

lscpu makes sense for other architectures as well.

> +
> +  - Output of: ``dmidecode`` from L0
> +
> +  - Output of: ``dmidecode`` from L1

This looks x86 specific? Maybe have a list of things that make sense
everywhere, and list architecture-specific stuff in specific
subsections?





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux