On 03/03/2014 02:51 PM, Benjamin Herrenschmidt wrote: > On Mon, 2014-03-03 at 11:24 +0800, Gavin Shan wrote: >> The problem is specific to the case of BIST issued applied to IPR >> adapter on the guest side. After BIST reset, we lose everything >> in MSIx table and we never have chance update MSIx messages for >> those enabled interrupts to MSIx table. >> >> The patch fixes it by writing MSIx message to MSIx table before >> reenabling them. > > That needs a better explanation... the guest does try to restore the > MSI-X (in a way similar to resuming from suspend) by writing the > message value back into the table. > > My understanding is that we trap that and turn that into a call to > vfio_msi_set_vector_signal, is that correct ? Yep. Gavin sent me a backtrace of what happens when the guest tries writing to a BAR: qemu/hw/pci/msix.c::msix_table_mmio_write msix_handle_mask_update msix_fire_vector_notifier qemu/hw/misc/vfio.c::vfio_msix_vector_use vfio_msix_vector_do_use IOCTL-to-VFIO: VFIO_DEVICE_SET_IRQS host/drivers/vfio/pci/vfio_pci.c::vfio_pci_ioctl host/drivers/vfio/pci/vfio_pci_intrs.c::vfio_pci_set_irqs_ioctl vfio_pci_set_msi_trigger vfio_msi_set_block vfio_msi_set_vector_signal While it works for our particular problem and seems correct, it has one flaw - hw/pci/msix.c will not generate this backtrace if masking bit does not change which can happen in general: === static void msix_handle_mask_update(PCIDevice *dev, int vector, bool was_masked) { bool is_masked = msix_is_masked(dev, vector); if (is_masked == was_masked) { return; } === Or if masking bit is the same, nothing bad is expected?... > In that case, it does indeed make sense to have that function rewrite > the cached message. > > (Just trying to understand how this patch fixes the problem we saw) > > I suppose other drivers would have similar issues if they have some > kind of internal "reset" path and try to save and restore the MSI-X > table. > Cheers, > Ben. > >> Reported-by: Wen Xiong <wenxiong@xxxxxxxxxx> >> Signed-off-by: Gavin Shan <shangw@xxxxxxxxxxxxxxxxxx> >> --- >> drivers/vfio/pci/vfio_pci_intrs.c | 19 +++++++++++++++++++ >> 1 file changed, 19 insertions(+) >> >> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c >> index 2103576..279ebd0 100644 >> --- a/drivers/vfio/pci/vfio_pci_intrs.c >> +++ b/drivers/vfio/pci/vfio_pci_intrs.c >> @@ -17,6 +17,7 @@ >> #include <linux/interrupt.h> >> #include <linux/eventfd.h> >> #include <linux/pci.h> >> +#include <linux/msi.h> >> #include <linux/file.h> >> #include <linux/poll.h> >> #include <linux/vfio.h> >> @@ -517,6 +518,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, >> struct pci_dev *pdev = vdev->pdev; >> int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector; >> char *name = msix ? "vfio-msix" : "vfio-msi"; >> + struct msi_msg msg; >> struct eventfd_ctx *trigger; >> int ret; >> >> @@ -544,6 +546,23 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, >> return PTR_ERR(trigger); >> } >> >> + /* We possiblly lose the MSI/MSIx message in some cases. >> + * For example, BIST reset on IPR adapter. The MSIx table >> + * is cleaned out. However, we never get chance to put >> + * MSIx messages to MSIx table because all MSIx stuff is >> + * being cached in QEMU. Here, we had the trick to put the >> + * MSI/MSIx message back. >> + * >> + * Basically, we needn't worry about MSI messages. However, >> + * it's not harmful and there might be cases of PCI config data >> + * lost because of cached PCI config data in QEMU again. >> + * >> + * Note that we should flash the message prior to enabling >> + * the corresponding interrupt by request_irq(). >> + */ >> + get_cached_msi_msg(irq, &msg); >> + write_msi_msg(irq, &msg); >> + >> ret = request_irq(irq, vfio_msihandler, 0, >> vdev->ctx[vector].name, trigger); >> if (ret) { > > -- Alexey -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html