Re: 4.4.x kernel (only) gives pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Marc,

On Sat, Feb 13, 2016 at 01:57:36PM -0800, Marc MERLIN wrote:
> Howdy,
> 
> I just upgraded my laptop to a Lenovo thinkpad P70 (skylake), moved my linux
> image (4.4.1 kernel), and I'm pseudo-randomly getting these:
> 
> pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
> pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
> pcieport 0000:00:1c.4:   device [8086:a114] error status/mask=00001000/00002000
> pcieport 0000:00:1c.4:    [12] Replay Timer Timeout
> pcieport 0000:00:1c.4: AER: Corrected error received: id=00e4
> pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
> pcieport 0000:00:1c.4:   device [8086:a114] error status/mask=00001000/00002000
> pcieport 0000:00:1c.4:    [12] Replay Timer Timeout
> 
> pcieport 0000:00:1c.4: AER: Multiple Corrected error received: id=00e4
> pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
> pcieport 0000:00:1c.4:   device [8086:a114] error status/mask=00001000/00002000
> pcieport 0000:00:1c.4:    [12] Replay Timer Timeout
> pcieport 0000:00:1c.4: AER: Multiple Corrected error received: id=00e4
> pcieport 0000:00:1c.4: can't find device of ID00e4
> pcieport 0000:00:1c.4: AER: Multiple Corrected error received: id=00e4
> pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e4(Transmitter ID)
> 
> They did not seem to be happening with 4.3.3 kernel.
> With 4.4.1, I've had a boot where I got so many of those that the machine was unusable.
> Other times, it happens a bit, and stops.
> My last boot, it didn't happen at all.
> 
> Sadly, I have no idea what they mean, what I should do about them, and
> why they only seem to be happening with 4.4.1 and not older kernels.
> 
> Boot log: http://marc.merlins.org/tmp/4.1.4.boot.txt
> config.gz: http://marc.merlins.org/tmp/4.1.4.config.gz
> 
> 8086:a114 is this:
> PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #5 (rev f1)
> 00:1c.4 0604: 8086:a114 (rev f1) (prog-if 00 [Normal decode])
>         Flags: bus master, fast devsel, latency 0, IRQ 123
>         Bus: primary=00, secondary=05, subordinate=6f, sec-latency=0
>         I/O behind bridge: 00002000-00002fff
>         Memory behind bridge: a4000000-ba0fffff
>         Prefetchable memory behind bridge: 0000000080000000-00000000a1ffffff
>         Capabilities: [40] Express Root Port (Slot+), MSI 00
>         Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
>         Capabilities: [90] Subsystem: 17aa:222d
>         Capabilities: [a0] Power Management version 3
>         Capabilities: [100] Advanced Error Reporting
>         Capabilities: [140] Access Control Services
>         Capabilities: [220] #19
>         Kernel driver in use: pcieport
> 
> Can someone offer some suggestions?

Thanks a lot for your report.  I think this is probably the same issue
reported in these bug reports:

  https://bugzilla.kernel.org/show_bug.cgi?id=109691
  https://bugzilla.kernel.org/show_bug.cgi?id=111601
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1521173

Short story: the AER driver receives the corrected error notification
but fails to clear it.  Nobody has stepped up to fix the bug yet.  You
can probably work around it by disabling AER completely by booting
with "pci=noaer".

I attached your dmesg log to
https://bugzilla.kernel.org/show_bug.cgi?id=111601

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux