Re: [PATCH v6 04/10] PCI/MSI: Don't disable MSI/MSI-X at shutdown

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 13, 2015 at 11:45:31AM -0500, Eric W. Biederman wrote:
> Bjorn Helgaas <bhelgaas@xxxxxxxxxx> writes:
> 
> > On Mon, Apr 13, 2015 at 4:37 AM, Fam Zheng <famz@xxxxxxxxxx> wrote:
> >> Hi Bjorn,
> >>
> >> On Fri, 04/10 17:54, Bjorn Helgaas wrote:
> >>> From: Michael S. Tsirkin <mst@xxxxxxxxxx>
> >>>
> >>> d52877c7b1af ("pci/irq: let pci_device_shutdown to call pci_msi_shutdown
> >>> v2") disabled MSI/MSI-X at device shutdown to address a kexec problem.
> >>>
> >>> The problem is that after we disable MSI, the device may assert INTx, and
> >>> if the driver hasn't registered an interrupt handler for it, the interrupt
> >>> is never deasserted and causes a kernel hang.  In particular, this was
> >>> observed with virtio.
> >>>
> >>> We now disable MSI/MSI-X for all devices during enumeration regardless of
> >>> CONFIG_PCI_MSI.  This solves the kexec problem in the new kernel, not the
> >>> old one.
> >>>
> >>> Stop disabling MSIs at shutdown to avoid the kernel hang.
> >>>
> >>> XXX bugzilla reference, details about how the hang happens?
> >>
> >> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=96571
> >>
> >> Please let me know if you need any further information in the bug.
> >
> > Please attach a complete dmesg log.  The bugzilla doesn't really have
> > any new information other than that you see a soft lockup.  I'm trying
> > to connect more of the dots between a spurious interrupt and a hang or
> > soft lockup.
> >
> 
> The bugzilla implies that there is a screaming irq (which causes the
> softlockup when they disable the kernels protections for buggy irqs).
> 
> > It doesn't seem right that a spurious interrupt could cause a hang or
> > soft lockup.
> 
> The interrupt handler keeps firing.
> 
> > I would think Linux would emit a message about the
> > unexpected interrupt, but would otherwise be relatively unconcerned.
> 
> That was disabled on the kernel command line.
> 
> > So I'm trying to figure out why my assumption is wrong.  Probably this
> > is just because I don't know much about Linux IRQ handling.
> >
> > Having more details, e.g., a stacktrace fragment from a soft lockup,
> > can also help people connect a problem they're seeing with the
> > solution.  It's pretty hard to google for "kernel hang," but if you
> > can google for a soft lockup in a specific function, that can be much
> > more useful.
> 
> The thing is not disabling msi interrupts for the case described in the
> buzilla report is the wrong fix.
> 
> The report is about a buggy driver doing the wrong thing.  Until someone
> ships a system that is msi native (aka no intx support) disabling msi
> interrupts as shutdown is the right thing to do.  If there is something
> that handles intx interrupts it is not an msi native system.
> 
> The real bug is probably disabling bugging interrupt detection on the
> kernel command line.
> 
> Beyond that to handle kexec cleanly something needs to stop the
> interrupts and stop the the DMA transfers.   Which in the short term
> means someone probably needs to write a shutdown method for the buggy
> driver.
> 
> An interrupt coming in almost always implies a DMA having completed,
> and if that DMA completed in the wrong spot the kexec'd kernel will be
> toast.
> 
> We disable interrupts at boot so that a kernel started with
> kexec-on-panic (which doesn't shut anything down) can boot.  There are
> probably other valid use cases (like native msi interrupts) but I am not
> aware of them.  But according to the pci spec shutting down msi
> interrupts at boot should be a noop.
> 
> So in summary not disabling MSI/MSI-X at shutdown is the wrong fix,
> and someone needs to fix a buggy driver.
> 
> Eric

I'm not all that worried about this patch making it into stable.  So I
suggest for now we ignore the bugzilla and just focus on the patch
itself.

And the patch itself is not about a buggy driver.  It's about
a correct driver causing screaming interrupts because
pci core decided to disable msi at shutdown.

Which is not necessary for two reasons:
- because previous patches disable msi when kexec starts now
- because suppressing DMA automatically suppresses MSI
  as well




-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux