Re: A question of msix feature

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 22, 2013 at 8:02 PM, wubo <wuborush@xxxxxxxxx> wrote:
> Hi, all
>
> Sorry for troubling you.
> We are developing msix feature on our product, unfortunately it will
> lead kernel to crash
> on a server PC whose cpu is Intel(R) Xeon(R) CPU E5645, and we are
> sure that our driver is good
> on common personal PC.
>
> A piece code in our driver like that:
> for (i = 0; i < msix_num; i++) {
> msix = &pcie->msix_entries[i];
> msix->entry = i;
> }
> ret = pci_enable_msix(XX);
> for (i = 0; i < msix_num; i++) {
> msix = &pcie->msix_entries[i];
> ret = request_irq(msix->vector, XX);
> }
>
> BTW, the kernel crash info is as follows:
> [Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 32993
> [Hardware Error]: APEI generic hardware error status
> [Hardware Error]: severity: 1, fatal
> [Hardware Error]: section: 0, severity: 1, fatal
> [Hardware Error]: flags: 0x01
> [Hardware Error]: primary
> [Hardware Error]: section_type: PCIe error
> [Hardware Error]: port_type: 0, PCIe end point
> [Hardware Error]: version: 1.0
> [Hardware Error]: command: 0x0407, status: 0x0010
> [Hardware Error]: device_id: 0000:04:00.0
> [Hardware Error]: slot: 2
> [Hardware Error]: secondary_bus: 0x00
> [Hardware Error]: vendor_id: 0x1c5f, device_id: 0x0530
> [Hardware Error]: class_code: 008001
>
> Do I miss something important?? Can anybody give me some hints?

It's very difficult to give any hints based on so little information.
The error looks like a PCIe hardware issue, which should not
necessarily cause the kernel itself to crash (and if the kernel *did*
crash, you didn't include any information about that).

I don't know how to interpret this APEI error info.  It's possible
that your BIOS logged it and can give more details.  The most likely
problem is that you programmed some incorrect MSI address/data info
into the device, and when it attempted to signal an MSI, it caused the
error.  Or it could be a regular device DMA gone awry.

You could compare your driver's MSI handling with other drivers in the
tree.  You could try to figure out the difference between the "common
personal PC" (where your driver apparently works) and the server PC
(where it fails) -- boot the server with a reduced configuration
(fewer CPUs, fewer other devices, etc.) to make it more like the
personal PC.  You could try using fewer MSI-X IRQs.  You could try
using MSI or line-based interrupts to make sure it's really an
MSI-related problem.

Since most drivers do use MSI-X successfully, the problem is likely in
your driver, not in the Linux PCI code.  I've given you some hints
above, but in general, people don't have time to help debug
proprietary, out-of-tree drivers.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux