On Sun, Dec 13, 2015 at 11:47:44PM +0800, Lan, Tianyu wrote: > > > On 12/11/2015 1:16 AM, Alexander Duyck wrote: > >On Thu, Dec 10, 2015 at 6:38 AM, Lan, Tianyu <tianyu.lan@xxxxxxxxx> wrote: > >> > >> > >>On 12/10/2015 7:41 PM, Dr. David Alan Gilbert wrote: > >>>> > >>>>Ideally, it is able to leave guest driver unmodified but it requires the > >>>>>hypervisor or qemu to aware the device which means we may need a driver > >>>>>in > >>>>>hypervisor or qemu to handle the device on behalf of guest driver. > >>> > >>>Can you answer the question of when do you use your code - > >>> at the start of migration or > >>> just before the end? > >> > >> > >>Just before stopping VCPU in this version and inject VF mailbox irq to > >>notify the driver if the irq handler is installed. > >>Qemu side also will check this via the faked PCI migration capability > >>and driver will set the status during device open() or resume() callback. > > > >The VF mailbox interrupt is a very bad idea. Really the device should > >be in a reset state on the other side of a migration. It doesn't make > >sense to have the interrupt firing if the device is not configured. > >This is one of the things that is preventing you from being able to > >migrate the device while the interface is administratively down or the > >VF driver is not loaded. > > From my opinion, if VF driver is not loaded and hardware doesn't start > to work, the device state doesn't need to be migrated. > > We may add a flag for driver to check whether migration happened during it's > down and reinitialize the hardware and clear the flag when system try to put > it up. > > We may add migration core in the Linux kernel and provide some helps > functions to facilitate to add migration support for drivers. > Migration core is in charge to sync status with Qemu. > > Example. > migration_register() > Driver provides > - Callbacks to be called before and after migration or for bad path > - Its irq which it prefers to deal with migration event. > > migration_event_check() > Driver calls it in the irq handler. Migration core code will check > migration status and call its callbacks when migration happens. > > > > > >My thought on all this is that it might make sense to move this > >functionality into a PCI-to-PCI bridge device and make it a > >requirement that all direct-assigned devices have to exist behind that > >device in order to support migration. That way you would be working > >with a directly emulated device that would likely already be > >supporting hot-plug anyway. Then it would just be a matter of coming > >up with a few Qemu specific extensions that you would need to add to > >the device itself. The same approach would likely be portable enough > >that you could achieve it with PCIe as well via the same configuration > >space being present on the upstream side of a PCIe port or maybe a > >PCIe switch of some sort. > > > >It would then be possible to signal via your vendor-specific PCI > >capability on that device that all devices behind this bridge require > >DMA page dirtying, you could use the configuration in addition to the > >interrupt already provided for hot-plug to signal things like when you > >are starting migration, and possibly even just extend the shpc > >functionality so that if this capability is present you have the > >option to pause/resume instead of remove/probe the device in the case > >of certain hot-plug events. The fact is there may be some use for a > >pause/resume type approach for PCIe hot-plug in the near future > >anyway. From the sounds of it Apple has required it for all > >Thunderbolt device drivers so that they can halt the device in order > >to shuffle resources around, perhaps we should look at something > >similar for Linux. > > > >The other advantage behind grouping functions on one bridge is things > >like reset domains. The PCI error handling logic will want to be able > >to reset any devices that experienced an error in the event of > >something such as a surprise removal. By grouping all of the devices > >you could disable/reset/enable them as one logical group in the event > >of something such as the "bad path" approach Michael has mentioned. > > > > These sounds we need to add a faked bridge for migration and adding a > driver in the guest for it. It also needs to extend PCI bus/hotplug > driver to do pause/resume other devices, right? > > My concern is still that whether we can change PCI bus/hotplug like that > without spec change. > > IRQ should be general for any devices and we may extend it for > migration. Device driver also can make decision to support migration > or not. A dedicated IRQ per device for something that is a system wide event sounds like a waste. I don't understand why a spec change is strictly required, we only need to support this with the specific virtual bridge used by QEMU, so I think that a vendor specific capability will do. Once this works well in the field, a PCI spec ECN might make sense to standardise the capability. > > > >>> > >>>>>>>It would be great if we could avoid changing the guest; but at least > >>>>>>>your guest > >>>>>>>driver changes don't actually seem to be that hardware specific; > >>>>>>>could your > >>>>>>>changes actually be moved to generic PCI level so they could be made > >>>>>>>to work for lots of drivers? > >>>> > >>>>> > >>>>>It is impossible to use one common solution for all devices unless the > >>>>>PCIE > >>>>>spec documents it clearly and i think one day it will be there. But > >>>>>before > >>>>>that, we need some workarounds on guest driver to make it work even it > >>>>>looks > >>>>>ugly. > >> > >> > >>Yes, so far there is not hardware migration support and it's hard to modify > >>bus level code. It also will block implementation on the Windows. > > > >Please don't assume things. Unless you have hard data from Microsoft > >that says they want it this way lets just try to figure out what works > >best for us for now and then we can start worrying about third party > >implementations after we have figured out a solution that actually > >works. > > > >- Alex > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html