Re: [Qemu-devel] live migration vs device assignment (motivation)

"Michael S. Tsirkin" <mst@xxxxxxxxxx> · Mon, 14 Dec 2015 11:26:35 +0200

On Sun, Dec 13, 2015 at 11:47:44PM +0800, Lan, Tianyu wrote:
> 
> 
> On 12/11/2015 1:16 AM, Alexander Duyck wrote:
> >On Thu, Dec 10, 2015 at 6:38 AM, Lan, Tianyu <tianyu.lan@xxxxxxxxx> wrote:
> >>
> >>
> >>On 12/10/2015 7:41 PM, Dr. David Alan Gilbert wrote:
> >>>>
> >>>>Ideally, it is able to leave guest driver unmodified but it requires the
> >>>>>hypervisor or qemu to aware the device which means we may need a driver
> >>>>>in
> >>>>>hypervisor or qemu to handle the device on behalf of guest driver.
> >>>
> >>>Can you answer the question of when do you use your code -
> >>>     at the start of migration or
> >>>     just before the end?
> >>
> >>
> >>Just before stopping VCPU in this version and inject VF mailbox irq to
> >>notify the driver if the irq handler is installed.
> >>Qemu side also will check this via the faked PCI migration capability
> >>and driver will set the status during device open() or resume() callback.
> >
> >The VF mailbox interrupt is a very bad idea.  Really the device should
> >be in a reset state on the other side of a migration.  It doesn't make
> >sense to have the interrupt firing if the device is not configured.
> >This is one of the things that is preventing you from being able to
> >migrate the device while the interface is administratively down or the
> >VF driver is not loaded.
> 
> From my opinion, if VF driver is not loaded and hardware doesn't start
> to work, the device state doesn't need to be migrated.
> 
> We may add a flag for driver to check whether migration happened during it's
> down and reinitialize the hardware and clear the flag when system try to put
> it up.
> 
> We may add migration core in the Linux kernel and provide some helps
> functions to facilitate to add migration support for drivers.
> Migration core is in charge to sync status with Qemu.
> 
> Example.
> migration_register()
> Driver provides
> - Callbacks to be called before and after migration or for bad path
> - Its irq which it prefers to deal with migration event.
> 
> migration_event_check()
> Driver calls it in the irq handler. Migration core code will check
> migration status and call its callbacks when migration happens.
> 
> 
> >
> >My thought on all this is that it might make sense to move this
> >functionality into a PCI-to-PCI bridge device and make it a
> >requirement that all direct-assigned devices have to exist behind that
> >device in order to support migration.  That way you would be working
> >with a directly emulated device that would likely already be
> >supporting hot-plug anyway.  Then it would just be a matter of coming
> >up with a few Qemu specific extensions that you would need to add to
> >the device itself.  The same approach would likely be portable enough
> >that you could achieve it with PCIe as well via the same configuration
> >space being present on the upstream side of a PCIe port or maybe a
> >PCIe switch of some sort.
> >
> >It would then be possible to signal via your vendor-specific PCI
> >capability on that device that all devices behind this bridge require
> >DMA page dirtying, you could use the configuration in addition to the
> >interrupt already provided for hot-plug to signal things like when you
> >are starting migration, and possibly even just extend the shpc
> >functionality so that if this capability is present you have the
> >option to pause/resume instead of remove/probe the device in the case
> >of certain hot-plug events.  The fact is there may be some use for a
> >pause/resume type approach for PCIe hot-plug in the near future
> >anyway.  From the sounds of it Apple has required it for all
> >Thunderbolt device drivers so that they can halt the device in order
> >to shuffle resources around, perhaps we should look at something
> >similar for Linux.
> >
> >The other advantage behind grouping functions on one bridge is things
> >like reset domains.  The PCI error handling logic will want to be able
> >to reset any devices that experienced an error in the event of
> >something such as a surprise removal.  By grouping all of the devices
> >you could disable/reset/enable them as one logical group in the event
> >of something such as the "bad path" approach Michael has mentioned.
> >
> 
> These sounds we need to add a faked bridge for migration and adding a
> driver in the guest for it. It also needs to extend PCI bus/hotplug
> driver to do pause/resume other devices, right?
> 
> My concern is still that whether we can change PCI bus/hotplug like that
> without spec change.
> 
> IRQ should be general for any devices and we may extend it for
> migration. Device driver also can make decision to support migration
> or not.

A dedicated IRQ per device for something that is a system wide event
sounds like a waste.  I don't understand why a spec change is strictly
required, we only need to support this with the specific virtual bridge
used by QEMU, so I think that a vendor specific capability will do.
Once this works well in the field, a PCI spec ECN might make sense
to standardise the capability.

> 
> 
> >>>
> >>>>>>>It would be great if we could avoid changing the guest; but at least
> >>>>>>>your guest
> >>>>>>>driver changes don't actually seem to be that hardware specific;
> >>>>>>>could your
> >>>>>>>changes actually be moved to generic PCI level so they could be made
> >>>>>>>to work for lots of drivers?
> >>>>
> >>>>>
> >>>>>It is impossible to use one common solution for all devices unless the
> >>>>>PCIE
> >>>>>spec documents it clearly and i think one day it will be there. But
> >>>>>before
> >>>>>that, we need some workarounds on guest driver to make it work even it
> >>>>>looks
> >>>>>ugly.
> >>
> >>
> >>Yes, so far there is not hardware migration support and it's hard to modify
> >>bus level code. It also will block implementation on the Windows.
> >
> >Please don't assume things.  Unless you have hard data from Microsoft
> >that says they want it this way lets just try to figure out what works
> >best for us for now and then we can start worrying about third party
> >implementations after we have figured out a solution that actually
> >works.
> >
> >- Alex
> >
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html