Re: [Qemu-devel] [QEMU PATCH v2 0/2]: KVM: i386: Add support for save and restore nested state

"Dr. David Alan Gilbert" <dgilbert@xxxxxxxxxx> · Fri, 2 Nov 2018 16:58:35 +0000

* Daniel P. Berrangé (berrange@xxxxxxxxxx) wrote:
> On Fri, Nov 02, 2018 at 10:40:35AM +0100, Paolo Bonzini wrote:
> > On 02/11/2018 04:46, Liran Alon wrote:
> > >> On Thu, Nov1, 2018 at 09:45 AM, Jim Mattson <jmattson@xxxxxxxxxx> wrote:
> > > 
> > >>> On Thu, Nov 1, 2018 at 8:56 AM, Dr. David Alan Gilbert <dgilbert@xxxxxxxxxx> wrote:
> > > 
> > >>> So if I have matching host kernels it should always work?
> > >>> What happens if I upgrade the source kernel to increase it's maximum
> > >>> nested size, can I force it to keep things small for some VMs?
> > > 
> > >> Any change to the format of the nested state should be gated by a
> > >> KVM_CAP set by userspace. (Unlike, say, how the
> > >> KVM_VCPUEVENT_VALID_SMM flag was added to the saved VCPU events state
> > >> in commit f077825a8758d.) KVM has traditionally been quite bad about
> > >> maintaining backwards compatibility, but I hope the community is more
> > >> cognizant of the issues now.
> > > 
> > >> As a cloud provider, one would only enable the new capability from
> > >> userspace once all hosts in the pool have a kernel that supports it.
> > >> During the transition, the capability would not be enabled on the
> > >> hosts with a new kernel, and these hosts would continue to provide
> > >> nested state that could be consumed by hosts running the older kernel.
> > > 
> > > Hmm this makes sense.
> > > 
> > > This means though that the patch I have submitted here isn't good enough.
> > > My patch currently assumes that when it attempts to get nested state from KVM,
> > > QEMU should always set nested_state->size to max size supported by KVM as received
> > > from kvm_check_extension(s, KVM_CAP_NESTED_STATE);
> > > (See kvm_get_nested_state() introduced on my patch).
> > > This indeed won't allow migration from host with new KVM to host with old KVM if
> > > nested_state size was enlarged between these KVM versions.
> > > Which is obviously an issue.
> > 
> > Actually I think this is okay, because unlike the "new" capability was
> > enabled, KVM would always reduce nested_state->size to a value that is
> > compatible with current kernels.
> > 
> > > But on second thought, I'm not sure that this is the right approach as-well.
> > > We don't really want the used version of nested_state to be determined on kvm_init().
> > > * On source QEMU, we actually want to determine it when preparing for migration based
> > > on to the support given by our destination host. If it's an old host, we would like to
> > > save an old version nested_state and if it's a new host, we will like to save our newest
> > > supported nested_state.
> > 
> > No, that's wrong because it would lead to losing state.  If the source
> > QEMU supports more state than the destination QEMU, and the current VM
> > state needs to transmit it for migration to be _correct_, then migration
> > to that destination QEMU must fail.
> > 
> > In particular, enabling the new KVM capability needs to be gated by a
> > new machine type and/or -cpu flag, if migration compatibility is needed.
> >  (In particular, this is one reason why I haven't considered this series
> > for 3.1.  Right now, migration of nested hypervisors is completely
> > busted but if we make it "almost" work, pre-3.1 machine types would not
> > ever be able to add support for KVM_CAP_EXCEPTION_PAYLOAD.  Therefore,
> > it's better for users if we wait for one release more, and add support
> > for KVM_CAP_NESTED_STATE and KVM_CAP_EXCEPTION_PAYLOAD at the same time).
> > 
> > Personally, I would like to say that, starting from QEMU 3.2, enabling
> > nested VMX requires a 4.20 kernel.  It's a bit bold, but I think it's a
> > good way to keep some sanity.  Any opinions on that?
> 
> We have usually followed a rule that new machine types must not
> affect runability of a VM on a host. IOW new machine types should
> not introduce dependancies on specific kernels, or hardware features
> such as CPU flags.
> 
> Anything that requires a new kernel feature thus ought to be an
> opt-in config tunable on the CLI, separate from machine type
> choice.  Alternatively in this case, it could potentially be a
> migration parameter settable via QMP. QEMU on each side could
> advertize whether the migration parameter is available, and
> the mgmt app (which can see both sides of the migration) can
> then decide whether to enable it.

This is a little odd though since it relates to the
contents/size/consistency of the guest state directly.

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK