Re: [PATCH v6 04/10] PCI/MSI: Don't disable MSI/MSI-X at shutdown

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Bjorn Helgaas <bhelgaas@xxxxxxxxxx> writes:

> On Mon, Apr 13, 2015 at 4:37 AM, Fam Zheng <famz@xxxxxxxxxx> wrote:
>> Hi Bjorn,
>>
>> On Fri, 04/10 17:54, Bjorn Helgaas wrote:
>>> From: Michael S. Tsirkin <mst@xxxxxxxxxx>
>>>
>>> d52877c7b1af ("pci/irq: let pci_device_shutdown to call pci_msi_shutdown
>>> v2") disabled MSI/MSI-X at device shutdown to address a kexec problem.
>>>
>>> The problem is that after we disable MSI, the device may assert INTx, and
>>> if the driver hasn't registered an interrupt handler for it, the interrupt
>>> is never deasserted and causes a kernel hang.  In particular, this was
>>> observed with virtio.
>>>
>>> We now disable MSI/MSI-X for all devices during enumeration regardless of
>>> CONFIG_PCI_MSI.  This solves the kexec problem in the new kernel, not the
>>> old one.
>>>
>>> Stop disabling MSIs at shutdown to avoid the kernel hang.
>>>
>>> XXX bugzilla reference, details about how the hang happens?
>>
>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=96571
>>
>> Please let me know if you need any further information in the bug.
>
> Please attach a complete dmesg log.  The bugzilla doesn't really have
> any new information other than that you see a soft lockup.  I'm trying
> to connect more of the dots between a spurious interrupt and a hang or
> soft lockup.
>

The bugzilla implies that there is a screaming irq (which causes the
softlockup when they disable the kernels protections for buggy irqs).

> It doesn't seem right that a spurious interrupt could cause a hang or
> soft lockup.

The interrupt handler keeps firing.

> I would think Linux would emit a message about the
> unexpected interrupt, but would otherwise be relatively unconcerned.

That was disabled on the kernel command line.

> So I'm trying to figure out why my assumption is wrong.  Probably this
> is just because I don't know much about Linux IRQ handling.
>
> Having more details, e.g., a stacktrace fragment from a soft lockup,
> can also help people connect a problem they're seeing with the
> solution.  It's pretty hard to google for "kernel hang," but if you
> can google for a soft lockup in a specific function, that can be much
> more useful.

The thing is not disabling msi interrupts for the case described in the
buzilla report is the wrong fix.

The report is about a buggy driver doing the wrong thing.  Until someone
ships a system that is msi native (aka no intx support) disabling msi
interrupts as shutdown is the right thing to do.  If there is something
that handles intx interrupts it is not an msi native system.

The real bug is probably disabling bugging interrupt detection on the
kernel command line.

Beyond that to handle kexec cleanly something needs to stop the
interrupts and stop the the DMA transfers.   Which in the short term
means someone probably needs to write a shutdown method for the buggy
driver.

An interrupt coming in almost always implies a DMA having completed,
and if that DMA completed in the wrong spot the kexec'd kernel will be
toast.

We disable interrupts at boot so that a kernel started with
kexec-on-panic (which doesn't shut anything down) can boot.  There are
probably other valid use cases (like native msi interrupts) but I am not
aware of them.  But according to the pci spec shutting down msi
interrupts at boot should be a noop.

So in summary not disabling MSI/MSI-X at shutdown is the wrong fix,
and someone needs to fix a buggy driver.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux