Re: [Qemu-devel] [QEMU PATCH v2 0/2]: KVM: i386: Add support for save and restore nested state

"Dr. David Alan Gilbert" <dgilbert@xxxxxxxxxx> · Thu, 1 Nov 2018 19:07:31 +0000

* Liran Alon (liran.alon@xxxxxxxxxx) wrote:
> 
> 
> > On 1 Nov 2018, at 17:56, Dr. David Alan Gilbert <dgilbert@xxxxxxxxxx> wrote:
> > 
> > * Liran Alon (liran.alon@xxxxxxxxxx) wrote:
> >> 
> >> 
> >>> On 1 Nov 2018, at 15:10, Dr. David Alan Gilbert <dgilbert@xxxxxxxxxx> wrote:
> >>> 
> >>> * Liran Alon (liran.alon@xxxxxxxxxx) wrote:
> >>>> 
> >>>> 
> >>>>> On 31 Oct 2018, at 20:59, Dr. David Alan Gilbert <dgilbert@xxxxxxxxxx> wrote:
> >>>>> 
> >>>>> * Liran Alon (liran.alon@xxxxxxxxxx) wrote:
> >>>>>> 
> >>>>>> 
> >>>>>>> On 31 Oct 2018, at 20:19, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
> >>>>>>> 
> >>>>>>> On 31/10/2018 19:17, Eduardo Habkost wrote:
> >>>>>>>> On Wed, Oct 31, 2018 at 03:03:34AM +0200, Liran Alon wrote:
> >>>>>>>>> Ping.
> >>>>>>>>> Patch was submitted almost two months ago and I haven’t seen any respond for the v2 of this series.
> >>>>>>>> Sorry for the long delay.  This was on my queue of patches to be
> >>>>>>>> reviewed, but I'm failing to keep up to the rate of incoming
> >>>>>>>> patches.  I will try to review the series next week.
> >>>>>>> 
> >>>>>>> I have already reviewed it; unfortunately I have missed the soft freeze
> >>>>>>> for posting the version I had also been working on when Liran posted
> >>>>>>> these patches.
> >>>>>>> 
> >>>>>>> Paolo
> >>>>>> 
> >>>>>> Paolo, note that this is v2 of this patch series. It’s not the one you have already reviewed.
> >>>>>> It now correctly handles the case you mentioned in review of v1 of migrating with various nested_state buffer sizes.
> >>>>>> The following scenarios were tested:
> >>>>>> (a) src and dest have same nested state size.
> >>>>>> 	==> Migration succeeds.
> >>>>>> (b) src don't have nested state while dest do.
> >>>>>> 	==> Migration succeed and src don't send it's nested state.
> >>>>>> (c) src have nested state while dest don't.
> >>>>>> 	==> Migration fails as it cannot restore nested state.
> >>>>>> (d) dest have bigger max nested state size than src
> >>>>>> 	==> Migration succeeds.
> >>>>>> (e) dest have smaller max nested state size than src but enough to store it's saved nested state
> >>>>>> 	==> Migration succeeds
> >>>>>> (f) dest have smaller max nested state size than src but not enough to store it's saved nested state
> >>>>>> 	==> Migration fails
> >>>>> 
> >>>>> Is it possible to tell these limits before the start of the migration,
> >>>>> or do we only find out that a nested migration won't work by trying it?
> >>>>> 
> >>>>> Dave
> >>>> 
> >>>> It is possible for the destination host to query what is it’s max nested state size.
> >>>> (This is what is returned from "kvm_check_extension(s, KVM_CAP_NESTED_STATE);” See kvm_init() code)
> >>> 
> >>> Is this max size a function of:
> >>> a) The host CPU
> >>> b) The host kernel
> >>> c) Some configuration
> >>> 
> >>> or all of those?
> >>> 
> >>> What about the maximum size that will be sent?
> >> 
> >> The max size is a function of (b). It depends on your KVM capabilities.
> >> This size that will be sent is also the max size at source.
> > 
> > So if I have matching host kernels it should always work?
> 
> Yes.

OK, that's a good start.

> > What happens if I upgrade the source kernel to increase it's maximum
> > nested size, can I force it to keep things small for some VMs?
> 
> Currently, the IOCTL which saves the nested state have only a single version which could potentially fill entire size (depending on current guest state).
> Saving the nested state obviously always attempts to save all relevant information it has because otherwise, you deliberately don't transfer to destination part of the
> state it needs to continue running the migrated guest correctly at destination.
> 
> It is true that I do expect future versions of this IOCTL to be able to “set nested state” of older versions (smaller one which lack some information)
> but I do expect migration to fail gracefully (and continue running guest at source) if destination is not capable of restoring all state sent from source
> (When destination max nested state size is smaller than source nested state size).

That older version thing would be good; then we would tie that to the
versioned machine types and/or CPU models some how; that way every
migration of a 3.2 qemu with a given CPU would work irrespective of host
version.

> > 
> >>> 
> >>>> However, I didn’t find any suitable mechanism in QEMU Live-Migration code to negotiate
> >>>> this destination host information with source prior to migration. Which kinda makes sense as
> >>>> this code is also used to my understanding in standard suspend/resume flow. In that scenario,
> >>>> there is no other side to negotiate with.
> >>> 
> >>> At the moment, if the user has appropriately configured their QEMU and
> >>> their are no IO errors, the migration should not fail; and the
> >>> management layer can responsible for configuring the QEMU and checking
> >>> compatibilities - 
> >> 
> >> I agree with this. The post_load() checks I have done are extra measures to prevent a failing migration.
> >> Management layer can perform extra work before the migration to make sure that source can actually migrate to destination.
> >> Taking the max size of the nested state into account.
> >> 
> >>> 
> >>>> So currently what happens in my patch is that source prepares the nested state buffer and sends it to destination as part of VMState,
> >>>> and destination attempts to load this nested state in it’s nested_state_post_load() function.
> >>>> If destination kernel cannot handle loading the nested state it has received from source, it fails the migration by returning
> >>>> failure from nested_state_post_load().
> >>> 
> >>> That's minimally OK, but ideally we'd have a way of knowing before we
> >>> start the migration if it was going to work, that way the management
> >>> layer can refuse it before it spends ages transferring data and then
> >>> trying to switch over - recovery from a failed migration can be a bit
> >>> tricky/risky.
> >>> 
> >>> I can't see it being much use in a cloud environment if the migration
> >>> will sometimes fail without the cloud admin being able to do something
> >>> about it.
> >> 
> >> With current QEMU Live-Migration code, I believe this is in the responsibility of the Control-Plane.
> >> It’s not different from the fact that Control-Plane is also responsible for launching the destination QEMU with same vHW configuration
> >> as the source QEMU. If there was some kind of negotiation mechanism in QEMU that verifies these vHW configuration matches *before*
> >> starting the migration (and not part of Control-Plane), I would have also added to that negotiation to consider the max size of nested state.
> >> But with current mechanisms, this is solely in the responsibility of the Control-Plane.
> > 
> > Yep, that's fine - all we have to make sure of is that:
> >  a) The control plane can somehow find out the maximums (via qemu or
> > something else on the host) so it can make that decision.
> 
> This is easy for QEMU or something else on the host to verify. Just a matter of sending IOCTL to KVM.

OK that's probably fine, we should check with libvirt people what they would like.

> >  b) Thing about how upgrades will work when one host is newer.
> 
> I made sure that it is possible for destination to accept nested state of size smaller than it’s max nested state size.
> Therefore, it should allow upgrades from one host to a newer host.

OK; the tricky thing is when you upgrade one host in a small cluster as
you start doing an upgrade, and then once it's got it's first VM you
can't migrate away from it until others are updated; that gets messy.

Dave

> (If of course IOCTL which “set nested state” is written correctly in a compatible way).
> 
> -Liran
> 
> > 
> > Dave
> > 
> >>> 
> >>>> Do you have a better suggestion?
> >>> 
> >>> I'll need to understand a bit more about the nested state.
> >> 
> >> Feel free to ask more questions about it and I will try to answer.
> >> 
> >> Thanks,
> >> -Liran
> >> 
> >>> 
> >>> Dave
> >>> 
> >>>> -Liran
> >>>> 
> >>>>> 
> >>>>>> -Liran
> >>>>>> 
> >>>>>> 
> >>>>> --
> >>>>> Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK
> >>>> 
> >>> --
> >>> Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK
> >> 
> > --
> > Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK