PCIe error reporting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm trying to determine how a PCIe card we are building handles (and
hopefully recovers from) PCIe link errors.
However I'm not at all sure what I should expect the x86 Linux host to do.

The card has an Altera FPGA and I can monitor things like changes to
it's LTSSM state engine, but not quite the full operation of the PCIe logic.

I've enabled AER and lspci seems to decode most of the bits but
it looks as though something needs to detect error bits being set
log the error and then clear them.

I did a rather brutal test - shorted the TX lines after the caps.
The card's PCIe logic issued a reset to the internal logic before
bringing the PCIe link back up.
I could then read config space - but the BARs were all zero
(I think lspci reported the old values, but the -x data showed zeros).
Nothing seemed to indicate the Linux thought anything was wrong.
Not surprisingly reads returned ~0u.

I should really try a much shorter error.

On that system (XEON E5-2600) dmesg contains (retyped):
  acpi: PNP0A08:00: _OSC: platform does not support [AER]

Another system is even more 'useless', it reports "AER handled
by the firmware".
If we take the PCIe link down (even after echo 1 >sys/devices/.../remove)
something generates an NMI!

Is this all 'expected' behaviour?
Anything else I should/could be looking at?
Is there anything that will poll the AER bits for me?

	David





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux