Re: [RFC v1 0/6] Live Migration with ephemeral host NIC devices

"Dr. David Alan Gilbert" <dgilbert@xxxxxxxxxx> · Wed, 13 May 2015 16:21:01 +0100

* Laine Stump (laine@xxxxxxxxxx) wrote:
> On 05/13/2015 10:42 AM, Dr. David Alan Gilbert wrote:
> > * Laine Stump (laine@xxxxxxxxxx) wrote:
> >> On 05/13/2015 04:28 AM, Peter Krempa wrote:
> >>> On Wed, May 13, 2015 at 09:08:39 +0100, Dr. David Alan Gilbert wrote:
> >>>> * Peter Krempa (pkrempa@xxxxxxxxxx) wrote:
> >>>>> On Wed, May 13, 2015 at 11:36:26 +0800, Chen Fan wrote:
> >>>>>> my main goal is to add support migration with host NIC
> >>>>>> passthrough devices and keep the network connectivity.
> >>>>>>
> >>>>>> this series patch base on Shradha's patches on
> >>>>>> https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html
> >>>>>> which is add migration support for host passthrough devices.
> >>>>>>
> >>>>>>  1) unplug the ephemeral devices before migration
> >>>>>>
> >>>>>>  2) do native migration
> >>>>>>
> >>>>>>  3) when migration finished, hotplug the ephemeral devices
> >>>>>
> >>>>> IMHO this algorithm is something that an upper layer management app
> >>>>> should do. The device unplug operation is complex and it might not
> >>>>> succeed which will make the current migration thread hang or fail in an
> >>>>> intermediate state that will not be recoverable.
> >>>>
> >>>> However you wouldn't want each of the upper layer management apps implementing
> >>>> their own hacks for this; so something somewhere needs to standardise
> >>>> what the guest sees.
> >>>
> >>> The guest still will see an PCI device unplug request and will have to
> >>> respond to it, then will be paused and after resume a new PCI device
> >>> will appear. This is standardised. The nonstandardised part (which can't
> >>> really be standardised) is how the bonding or other guest-dependant
> >>> stuff will be handled, but that is up to the guest OS to handle.
> >>>
> >>> From libvirt's perspective this is only something that will trigger the
> >>> device unplug and plug the devices back. And there are a lot of issues
> >>> here:
> >>>
> >>> 1) the destination of the migration might not have the desired devices
> >>>
> >>>     This will trigger a lot of problems as we will not be able to guarantee
> >>>     that the devices reappear on the destination and if we'd wanted to check
> >>>     we'd need a new migration protocol AFAIK.
> >>>
> >>> 2) The guest OS might refuse to detach the PCI device (it might be stuck
> >>> before PCI code is loaded)
> >>>
> >>>     In that case the migration will be stuck forever and abort attempts
> >>>     will make the domain state basically undefined depending on the
> >>>     phase where it failed.
> >>>
> >>> Since we can't guarantee that the unplug of the PCI host devices will be
> >>> atomic or that it will succeed we basically can't guarantee in any way
> >>> in which state the VM will end up later after (a possibly failed)
> >>> migration. To recover such state there are too many option that could be
> >>> desired by the user that would be hard to implement in a way that would
> >>> be flexible enough.
> >>
> >>
> >> In the past I've been on the side of having libvirt automatically do the
> >> device detach and reattach (but definitely on the side of the guest
> >> agent and libvirt keeping their hands off of network configuration in
> >> the guest), with the thinking that 1) libvirt is in a well situated spot
> >> to do it, and 2) this would eliminate duplicate code in the upper level
> >> management.
> >>
> >> However, Peter's points above made me consider the failure cases more
> >> closely, in particular this one:
> >>
> >> * the destination claims to have the resources required (right type of
> >> PCI device, enough RAM), so migration is started.
> >>
> >> * device detached on source, guest memory migrated to destination,
> >>
> >> * guest started - no problems. (At this point, since the guest has been
> >> restarted, it's not really possible for libvirt to fail the migration in
> >> a recoverable manner (unless you want to implement some sort of
> >> "unmigration" so that the guest state on the source is updated with
> >> whatever execution occurred on the destination, and I don't think
> >> *anyone* wants to go there))
> >>
> >> * libvirt finds the device still available and attempts to attach it but
> >> (for some odd reason) fails.
> >>
> >> Now libvirt can't tell the application that the migration has succeeded,
> >> because it didn't (unless the device was marked as "optional"), but it
> >> also can't fail the migration except to say "this is such a monumental
> >> failure that your guest has simply died".
> >>
> >> If, on the other hand, the detach and re-attach are implemented in a
> >> higher layer (ovirt/openstack), they will at least have the guest in a
> >> state they can deal with - it won't be pretty, but they could for
> >> example migrate the guest to another host (maybe back to the source) and
> >> re-attach there.
> >>
> >> So this one message from Peter has nicely pointed out the error in my
> >> thinking, and I now agree that auto-detach/reattach shouldn't be
> >> implemented in libvirt - it would work nicely in an error free world,
> >> but would crumble in the face of some errors. (I just wish I had
> >> considered the particular failure mode above a year or two ago, so I
> >> could have been more discouraging in my emails then :-)
> > 
> > 
> > It's a shame to limit the utility of this by dealing with an error case
> > that's not a fatal error.  Does libvirt not have a way of dealing with
> > non-fatal errors?
> 
> But is it non-fatal? Dan's point is that isn't up to libvirt to decide.
> In the case of attached USB devices, there is an attribute called
> startupPolicy which can be set to "mandatory", "requisite" or
> "optional". The first would cause a failure of the migration if the
> device wasn't present on the destination of migrate, while the other two
> would result in the device simply not being present on the destination.
> But USB works differently from PCI - I don't think it even detaces the
> device from the guest - so it doesn't have the same problems as a PCI
> device.
> 
> Although libvirt can reserve the device on the destination before the
> migration starts, once the guest CPUs have been restarted, there is
> currently "no going back". The only options would be 1) fail the
> migration and kill the guest on the destination (is there even a state
> for this?) or 2) implement new code to stop the CPUs and migrate the new
> memory state back to the source, restart the CPUs on the source, and
> report the migration as failed (not implemented, and wouldn't be very
> pretty).
> 
> We *could* just unilaterally decide that all PCI assigned devices are
> "optional" on the destination, and report the migration as a success
> (just without the device being attached), but that is getting into the
> territory of "libvirt making policy decisions" as discussed by Dan.

I don't see it as policy; it's just that we only have a good solution
for the "optional" case.

It's actually not the mechanics of doing the hot-add/remove that worry me;
getting a higher layer to do those is kind of OK; what I'm more
worried about is standardising the mechanism to let the guest know
about the pairs of devices, including when adding a new device.
Since that requires some guest cooperation, I wouldn't want the guest
cooperation to have to be dependent on which higher-level management
system is used.

Dave

--
Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list