* Laine Stump (laine@xxxxxxxxxx) wrote: > On 05/13/2015 10:42 AM, Dr. David Alan Gilbert wrote: > > * Laine Stump (laine@xxxxxxxxxx) wrote: > >> On 05/13/2015 04:28 AM, Peter Krempa wrote: > >>> On Wed, May 13, 2015 at 09:08:39 +0100, Dr. David Alan Gilbert wrote: > >>>> * Peter Krempa (pkrempa@xxxxxxxxxx) wrote: > >>>>> On Wed, May 13, 2015 at 11:36:26 +0800, Chen Fan wrote: > >>>>>> my main goal is to add support migration with host NIC > >>>>>> passthrough devices and keep the network connectivity. > >>>>>> > >>>>>> this series patch base on Shradha's patches on > >>>>>> https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html > >>>>>> which is add migration support for host passthrough devices. > >>>>>> > >>>>>> 1) unplug the ephemeral devices before migration > >>>>>> > >>>>>> 2) do native migration > >>>>>> > >>>>>> 3) when migration finished, hotplug the ephemeral devices > >>>>> > >>>>> IMHO this algorithm is something that an upper layer management app > >>>>> should do. The device unplug operation is complex and it might not > >>>>> succeed which will make the current migration thread hang or fail in an > >>>>> intermediate state that will not be recoverable. > >>>> > >>>> However you wouldn't want each of the upper layer management apps implementing > >>>> their own hacks for this; so something somewhere needs to standardise > >>>> what the guest sees. > >>> > >>> The guest still will see an PCI device unplug request and will have to > >>> respond to it, then will be paused and after resume a new PCI device > >>> will appear. This is standardised. The nonstandardised part (which can't > >>> really be standardised) is how the bonding or other guest-dependant > >>> stuff will be handled, but that is up to the guest OS to handle. > >>> > >>> From libvirt's perspective this is only something that will trigger the > >>> device unplug and plug the devices back. And there are a lot of issues > >>> here: > >>> > >>> 1) the destination of the migration might not have the desired devices > >>> > >>> This will trigger a lot of problems as we will not be able to guarantee > >>> that the devices reappear on the destination and if we'd wanted to check > >>> we'd need a new migration protocol AFAIK. > >>> > >>> 2) The guest OS might refuse to detach the PCI device (it might be stuck > >>> before PCI code is loaded) > >>> > >>> In that case the migration will be stuck forever and abort attempts > >>> will make the domain state basically undefined depending on the > >>> phase where it failed. > >>> > >>> Since we can't guarantee that the unplug of the PCI host devices will be > >>> atomic or that it will succeed we basically can't guarantee in any way > >>> in which state the VM will end up later after (a possibly failed) > >>> migration. To recover such state there are too many option that could be > >>> desired by the user that would be hard to implement in a way that would > >>> be flexible enough. > >> > >> > >> In the past I've been on the side of having libvirt automatically do the > >> device detach and reattach (but definitely on the side of the guest > >> agent and libvirt keeping their hands off of network configuration in > >> the guest), with the thinking that 1) libvirt is in a well situated spot > >> to do it, and 2) this would eliminate duplicate code in the upper level > >> management. > >> > >> However, Peter's points above made me consider the failure cases more > >> closely, in particular this one: > >> > >> * the destination claims to have the resources required (right type of > >> PCI device, enough RAM), so migration is started. > >> > >> * device detached on source, guest memory migrated to destination, > >> > >> * guest started - no problems. (At this point, since the guest has been > >> restarted, it's not really possible for libvirt to fail the migration in > >> a recoverable manner (unless you want to implement some sort of > >> "unmigration" so that the guest state on the source is updated with > >> whatever execution occurred on the destination, and I don't think > >> *anyone* wants to go there)) > >> > >> * libvirt finds the device still available and attempts to attach it but > >> (for some odd reason) fails. > >> > >> Now libvirt can't tell the application that the migration has succeeded, > >> because it didn't (unless the device was marked as "optional"), but it > >> also can't fail the migration except to say "this is such a monumental > >> failure that your guest has simply died". > >> > >> If, on the other hand, the detach and re-attach are implemented in a > >> higher layer (ovirt/openstack), they will at least have the guest in a > >> state they can deal with - it won't be pretty, but they could for > >> example migrate the guest to another host (maybe back to the source) and > >> re-attach there. > >> > >> So this one message from Peter has nicely pointed out the error in my > >> thinking, and I now agree that auto-detach/reattach shouldn't be > >> implemented in libvirt - it would work nicely in an error free world, > >> but would crumble in the face of some errors. (I just wish I had > >> considered the particular failure mode above a year or two ago, so I > >> could have been more discouraging in my emails then :-) > > > > > > It's a shame to limit the utility of this by dealing with an error case > > that's not a fatal error. Does libvirt not have a way of dealing with > > non-fatal errors? > > But is it non-fatal? Dan's point is that isn't up to libvirt to decide. > In the case of attached USB devices, there is an attribute called > startupPolicy which can be set to "mandatory", "requisite" or > "optional". The first would cause a failure of the migration if the > device wasn't present on the destination of migrate, while the other two > would result in the device simply not being present on the destination. > But USB works differently from PCI - I don't think it even detaces the > device from the guest - so it doesn't have the same problems as a PCI > device. > > Although libvirt can reserve the device on the destination before the > migration starts, once the guest CPUs have been restarted, there is > currently "no going back". The only options would be 1) fail the > migration and kill the guest on the destination (is there even a state > for this?) or 2) implement new code to stop the CPUs and migrate the new > memory state back to the source, restart the CPUs on the source, and > report the migration as failed (not implemented, and wouldn't be very > pretty). > > We *could* just unilaterally decide that all PCI assigned devices are > "optional" on the destination, and report the migration as a success > (just without the device being attached), but that is getting into the > territory of "libvirt making policy decisions" as discussed by Dan. I don't see it as policy; it's just that we only have a good solution for the "optional" case. It's actually not the mechanics of doing the hot-add/remove that worry me; getting a higher layer to do those is kind of OK; what I'm more worried about is standardising the mechanism to let the guest know about the pairs of devices, including when adding a new device. Since that requires some guest cooperation, I wouldn't want the guest cooperation to have to be dependent on which higher-level management system is used. Dave -- Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list