Re: [Bug 215027] New: "PCIe Bus Error: severity=Corrected, type=Physical Layer" flood on Intel VMD + Samsung NVMe combination

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 15, 2021 at 03:20:50PM -0600, Bjorn Helgaas wrote:
> [+cc Naveen, NVMe, VMD folks]
> 
> On Mon, Nov 15, 2021 at 07:17:01AM +0000, bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=215027
> > 
> >             Bug ID: 215027
> >            Summary: "PCIe Bus Error: severity=Corrected, type=Physical
> >                     Layer" flood on Intel VMD + Samsung NVMe combination
> >            Product: Drivers
> >            Version: 2.5
> >     Kernel Version: mainline, linux-next
> >           Hardware: All
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: PCI
> >           Assignee: drivers_pci@xxxxxxxxxxxxxxxxxxxx
> >           Reporter: kai.heng.feng@xxxxxxxxxxxxx
> >         Regression: No
> > 
> > The following tests (and any combination of them) don't help:
> > - Change NVMe LTR value to 0 or any other number
> > - Disable NVMe APST
> > - Disable PCIe ASPM
> > - Any version of kernel, including linux-next
> > - "Fix long standing AER Error Handling Issues" patch series [1]
> > 
> > [1]
> > https://lore.kernel.org/linux-pci/cover.1635179600.git.naveennaidu479@xxxxxxxxx/
> 
> Thanks a lot for the report, Kai-Heng.  It's on v5.15, which is good,
> and not marked as a regression.  Samples from dmesg:
> 
>   [    0.408995] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
>   [    0.410076] acpi PNP0A08:00: _OSC: platform does not support [AER]
>   [    0.412207] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME PCIeCapability LTR]
>   [    1.367220] vmd 0000:00:0e.0: PCI host bridge to bus 10000:e0
>   [    1.490742] vmd 0000:00:0e.0: Bound to PCI domain 10000
>   [    1.569083] nvme nvme0: pci function 10000:e1:00.0
>   [    1.571421] pcieport 10000:e0:06.0: can't derive routing for PCI INT A
>   [    1.573997] nvme 10000:e1:00.0: PCI INT A: not connected
>   [    1.579028] pcieport 10000:e0:06.0: AER: Corrected error received: 10000:e1:00.0
>   [    1.584839] nvme 10000:e1:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver)
>   [    1.587454] nvme 10000:e1:00.0:   device [144d:a80a] error status/mask=00000001/0000e000
>   [    1.589502] nvme 10000:e1:00.0:    [ 0] RxErr
>   [    1.589813] nvme nvme0: Shutdown timeout set to 10 seconds
>   [    1.591509] pcieport 10000:e0:06.0: AER: Corrected error received: 10000:e1:00.0
>   [    1.595252] pcieport 10000:e0:06.0: AER: can't find device of IDe100
>   [    1.597213] pcieport 10000:e0:06.0: AER: Corrected error received: 10000:e1:00.0
>   ...

Just for testing purposes, does it still produce the repeated error
messages if you disable VMD?



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux