On Tue, 2014-03-04 at 10:30 +0800, Gavin Shan wrote: > On Mon, Mar 03, 2014 at 12:36:16PM -0700, Alex Williamson wrote: > >On Mon, 2014-03-03 at 14:10 +0800, Gavin Shan wrote: > >> On Sun, Mar 02, 2014 at 09:49:49PM -0700, Alex Williamson wrote: > >> >On Mon, 2014-03-03 at 14:51 +1100, Benjamin Herrenschmidt wrote: > >> >> On Mon, 2014-03-03 at 11:24 +0800, Gavin Shan wrote: > > ../... > > >> > >> Various PCI devices could have different ways to do BIST. It's a bit > >> hard to capture all of them, I think. Alex, please advise which way > >> to fix the issue would be better. Or we can have this patch and figure > >> out the way using vfio_bar_restore() later? :-) > > > >This approach arguably has a very niche application. I'd rather see > >this device fall into the vfio_bar_restore() and add a callout to the > >vfio irq code to restore the MSIX table and MSI/MSIX capabilities if the > >surprise reset occurred with either enabled. That seems like it would > >be generally useful to any device with a backdoor reset. Besides, if > >the BARs on the device are being deprogrammed by BIST, then it must > >already be hitting the vfio_bar_restore code or just re-writing the > >vector table entry wouldn't be enough to get it running again. Thanks, > > > > I had some debugging output in vfio_bar_restore() for my case, which is > pci_save_state(), BIST and then pci_restore_state() on guest side. I never > saw those debugging output from vfio_bar_restore(). Host VFIO-PCI has cached > PCI config space, which was built in advance, and BIST on guest side didn't > get altered PCI_COMMAND (memory/IO enable). So we didn't call into the function > where we expect to restore MSIx stuff, Or I missed your point? :-) Ok, I think I'm slowly wrapping my head around exactly what's going on, let me summarize to make sure I understand. The driver for this device does some degree of configuration, including enabling MSI-X for the device. It then initiates a BIST via non-standard memory mapped registers which resets some parts of the device, including the MMIO space containing the MSI-X vector table, but not including PCI config space of the device. In particular, the COMMAND register and BAR values in PCI config space are left intact (and apparently the MSI-X capability as well). If the driver wanted to be really evil, it could do this with vectors enabled and rely on the reset value of the MSI-X vectors being masked. However, we get lucky and the driver has all of the vectors masked prior to reset so that after reset we see the masked to unmasked transition and use that to program the eventfd, which is where you'd like to re-write the vector message. Is that all correct? If the rest of config space is unaffected by this reset then my suggestion to tie it into vfio_bar_restore() probably doesn't make sense. In fact, it doesn't seem like there's any config space interaction we could count on to reliably detect this. Thinking out loud, I wonder if there's any value to reading the MSI-X message from the device and comparing to the cached value so that we can at least detect that we've encountered this situation. If we did that, we could restore all of the vectors in the table. However, it seems like the only value that would add would be to write a message to syslog since the guest will need to re-write any vectors it intends to use (and assuming we can count on vectors being masked after reset as the spec indicates). So, I guess I can't think of anything better than what you proposed (with better comments to detail the situation for later). Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html