On Mon, Apr 13, 2015 at 10:41:22AM -0500, Bjorn Helgaas wrote: > On Mon, Apr 13, 2015 at 4:37 AM, Fam Zheng <famz@xxxxxxxxxx> wrote: > > Hi Bjorn, > > > > On Fri, 04/10 17:54, Bjorn Helgaas wrote: > >> From: Michael S. Tsirkin <mst@xxxxxxxxxx> > >> > >> d52877c7b1af ("pci/irq: let pci_device_shutdown to call pci_msi_shutdown > >> v2") disabled MSI/MSI-X at device shutdown to address a kexec problem. > >> > >> The problem is that after we disable MSI, the device may assert INTx, and > >> if the driver hasn't registered an interrupt handler for it, the interrupt > >> is never deasserted and causes a kernel hang. In particular, this was > >> observed with virtio. > >> > >> We now disable MSI/MSI-X for all devices during enumeration regardless of > >> CONFIG_PCI_MSI. This solves the kexec problem in the new kernel, not the > >> old one. > >> > >> Stop disabling MSIs at shutdown to avoid the kernel hang. > >> > >> XXX bugzilla reference, details about how the hang happens? > > > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=96571 > > > > Please let me know if you need any further information in the bug. > > Please attach a complete dmesg log. The bugzilla doesn't really have > any new information other than that you see a soft lockup. I'm trying > to connect more of the dots between a spurious interrupt and a hang or > soft lockup. > > It doesn't seem right that a spurious interrupt could cause a hang or > soft lockup. I would think Linux would emit a message about the > unexpected interrupt, but would otherwise be relatively unconcerned. > So I'm trying to figure out why my assumption is wrong. Probably this > is just because I don't know much about Linux IRQ handling. > > Having more details, e.g., a stacktrace fragment from a soft lockup, > can also help people connect a problem they're seeing with the > solution. It's pretty hard to google for "kernel hang," but if you > can google for a soft lockup in a specific function, that can be much > more useful. I have investigated this, and I at this point I think the hang is basically a non issue. So the commit log should say if the driver hasn't registered an interrupt handler for it, the interrupt is never deasserted and causes spurious interrupts, typically followed by kernel disabling the irq. > >> [bhelgaas: changelog] > >> Reported-by: Fam Zheng <famz@xxxxxxxxxx> > >> Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx> > >> Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > >> CC: Yinghai Lu <yhlu.kernel.send@xxxxxxxxx> > >> CC: Ulrich Obergfell <uobergfe@xxxxxxxxxx> > >> CC: Rusty Russell <rusty@xxxxxxxxxxxxxxx> > >> --- > >> drivers/pci/pci-driver.c | 2 -- > >> 1 file changed, 2 deletions(-) > >> > >> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > >> index 3cb2210de553..38a602cb9fb7 100644 > >> --- a/drivers/pci/pci-driver.c > >> +++ b/drivers/pci/pci-driver.c > >> @@ -450,8 +450,6 @@ static void pci_device_shutdown(struct device *dev) > >> > >> if (drv && drv->shutdown) > >> drv->shutdown(pci_dev); > >> - pci_msi_shutdown(pci_dev); > >> - pci_msix_shutdown(pci_dev); > >> > >> #ifdef CONFIG_KEXEC > >> /* > >> -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html