Re: [Qemu-devel] [QEMU PATCH v2 0/2]: KVM: i386: Add support for save and restore nested state

"Dr. David Alan Gilbert" <dgilbert@xxxxxxxxxx> · Fri, 2 Nov 2018 12:35:05 +0000

* Paolo Bonzini (pbonzini@xxxxxxxxxx) wrote:
> On 02/11/2018 04:46, Liran Alon wrote:
> >> On Thu, Nov1, 2018 at 09:45 AM, Jim Mattson <jmattson@xxxxxxxxxx> wrote:
> > 
> >>> On Thu, Nov 1, 2018 at 8:56 AM, Dr. David Alan Gilbert <dgilbert@xxxxxxxxxx> wrote:
> > 
> >>> So if I have matching host kernels it should always work?
> >>> What happens if I upgrade the source kernel to increase it's maximum
> >>> nested size, can I force it to keep things small for some VMs?
> > 
> >> Any change to the format of the nested state should be gated by a
> >> KVM_CAP set by userspace. (Unlike, say, how the
> >> KVM_VCPUEVENT_VALID_SMM flag was added to the saved VCPU events state
> >> in commit f077825a8758d.) KVM has traditionally been quite bad about
> >> maintaining backwards compatibility, but I hope the community is more
> >> cognizant of the issues now.
> > 
> >> As a cloud provider, one would only enable the new capability from
> >> userspace once all hosts in the pool have a kernel that supports it.
> >> During the transition, the capability would not be enabled on the
> >> hosts with a new kernel, and these hosts would continue to provide
> >> nested state that could be consumed by hosts running the older kernel.
> > 
> > Hmm this makes sense.
> > 
> > This means though that the patch I have submitted here isn't good enough.
> > My patch currently assumes that when it attempts to get nested state from KVM,
> > QEMU should always set nested_state->size to max size supported by KVM as received
> > from kvm_check_extension(s, KVM_CAP_NESTED_STATE);
> > (See kvm_get_nested_state() introduced on my patch).
> > This indeed won't allow migration from host with new KVM to host with old KVM if
> > nested_state size was enlarged between these KVM versions.
> > Which is obviously an issue.
> 
> Actually I think this is okay, because unlike the "new" capability was
> enabled, KVM would always reduce nested_state->size to a value that is
> compatible with current kernels.
> 
> > But on second thought, I'm not sure that this is the right approach as-well.
> > We don't really want the used version of nested_state to be determined on kvm_init().
> > * On source QEMU, we actually want to determine it when preparing for migration based
> > on to the support given by our destination host. If it's an old host, we would like to
> > save an old version nested_state and if it's a new host, we will like to save our newest
> > supported nested_state.
> 
> No, that's wrong because it would lead to losing state.  If the source
> QEMU supports more state than the destination QEMU, and the current VM
> state needs to transmit it for migration to be _correct_, then migration
> to that destination QEMU must fail.
> 
> In particular, enabling the new KVM capability needs to be gated by a
> new machine type and/or -cpu flag, if migration compatibility is needed.
>  (In particular, this is one reason why I haven't considered this series
> for 3.1.  Right now, migration of nested hypervisors is completely
> busted but if we make it "almost" work, pre-3.1 machine types would not
> ever be able to add support for KVM_CAP_EXCEPTION_PAYLOAD.  Therefore,
> it's better for users if we wait for one release more, and add support
> for KVM_CAP_NESTED_STATE and KVM_CAP_EXCEPTION_PAYLOAD at the same time).
> 
> Personally, I would like to say that, starting from QEMU 3.2, enabling
> nested VMX requires a 4.20 kernel.  It's a bit bold, but I think it's a
> good way to keep some sanity.  Any opinions on that?

That seems a bit mean; there's a lot of people already using nested.

Dave

> Paolo
> 
> > Therefore, I don't think that we want this versioning to be based on KVM_CAP at all.
> > It seems that we would want the process to behave as follows:
> > 1) Mgmt-layer at dest queries dest host max supported nested_state size.
> >    (Which should be returned from kvm_check_extension(KVM_CAP_NESTED_STATE))
> > 2) Mgmt-layer at source initiate migration to dest with requesting QEMU to send nested_state 
> >    matching dest max supported nested_state size.
> >    When saving nested state using KVM_GET_NESTED_STATE IOCTL, QEMU will specify in nested_state->size
> >    the *requested* size to be saved and KVM should be able to save only the information which matches
> >    the version that worked with that size.
> > 3) After some sanity checks on received migration stream, dest host use KVM_SET_NESTED_STATE IOCTL.
> >    This IOCTL should deduce which information it should deploy based on given nested_state->size.
> > 
> > This also makes me wonder if it's not just nicer to use nested_state->flags to specify which
> > information is actually present on nested_state instead of managing versioning with nested_state->size.
> > 
> > What are your opinions on this?
> > 
> > -Liran
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK