On Wed, Nov 25, 2015 at 8:39 AM, Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: > On Wed, Nov 25, 2015 at 08:24:38AM -0800, Alexander Duyck wrote: >> >> Also, assuming you just want to do ifdown/ifup for some reason, it's >> >> easy enough to do using a guest agent, in a completely generic way. >> >> >> > >> > Just ifdown/ifup is not enough for migration. It needs to restore some PCI >> > settings before doing ifup on the target machine >> >> That is why I have been suggesting making use of suspend/resume logic >> that is already in place for PCI power management. In the case of a >> suspend/resume we already have to deal with the fact that the device >> will go through a D0->D3->D0 reset so we have to restore all of the >> existing state. It would take a significant load off of Qemu since >> the guest would be restoring its own state instead of making Qemu have >> to do all of the device migration work. > > That can work, though again, the issue is you need guest > cooperation to migrate. Right now the problem is you need to have guest cooperation anyway as you need to have some way of tracking the dirty pages. If the IOMMU on the host were to provide some sort of dirty page tracking then we could exclude the guest from the equation, but until then we need the guest to notify us of what pages it is letting the device dirty. I'm still of the opinion that the best way to go there is to just modify the DMA API that is used in the guest so that it supports some sort of page flag modification or something along those lines so we can track all of the pages that might be written to by the device. > If you reset device on destination instead of restoring state, > then that issue goes away, but maybe the downtime > will be increased. Yes, the downtime will be increased, but it shouldn't be by much. Depending on the setup a VF with a single queue can have about 3MB of data outstanding when you move the driver over. After that it is just a matter of bringing the interface back up which should take only a few hundred milliseconds assuming the PF is fairly responsive. > Will it really? I think it's worth it to start with the > simplest solution (reset on destination) and see > what the effect is, then add optimizations. Agreed. My thought would be to start with something like dma_mark_clean() that could be used to take care of marking the pages for migration when they are unmapped or synced. > One thing that I've been thinking about for a while, is saving (some) > state speculatively. For example, notify guest a bit before migration > is done, so it can save device state. If guest responds quickly, you > have state that can be restored. If it doesn't, still migrate, and it > will have to reset on destination. I'm not sure how much more device state we really need to save. The driver in the guest has to have enough state to recover in the event of a device failure resulting in a slot reset. To top it off the driver is able to reconfigure things probably as quick as we could if we were restoring the state. -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html