On Mon, Nov 15, 2021 at 03:20:50PM -0600, Bjorn Helgaas wrote: > [+cc Naveen, NVMe, VMD folks] > > On Mon, Nov 15, 2021 at 07:17:01AM +0000, bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote: > > https://bugzilla.kernel.org/show_bug.cgi?id=215027 > > > > Bug ID: 215027 > > Summary: "PCIe Bus Error: severity=Corrected, type=Physical > > Layer" flood on Intel VMD + Samsung NVMe combination > > Product: Drivers > > Version: 2.5 > > Kernel Version: mainline, linux-next > > Hardware: All > > OS: Linux > > Tree: Mainline > > Status: NEW > > Severity: normal > > Priority: P1 > > Component: PCI > > Assignee: drivers_pci@xxxxxxxxxxxxxxxxxxxx > > Reporter: kai.heng.feng@xxxxxxxxxxxxx > > Regression: No > > > > The following tests (and any combination of them) don't help: > > - Change NVMe LTR value to 0 or any other number > > - Disable NVMe APST > > - Disable PCIe ASPM > > - Any version of kernel, including linux-next > > - "Fix long standing AER Error Handling Issues" patch series [1] > > > > [1] > > https://lore.kernel.org/linux-pci/cover.1635179600.git.naveennaidu479@xxxxxxxxx/ > > Thanks a lot for the report, Kai-Heng. It's on v5.15, which is good, > and not marked as a regression. Samples from dmesg: > > [ 0.408995] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3] > [ 0.410076] acpi PNP0A08:00: _OSC: platform does not support [AER] > [ 0.412207] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME PCIeCapability LTR] > [ 1.367220] vmd 0000:00:0e.0: PCI host bridge to bus 10000:e0 > [ 1.490742] vmd 0000:00:0e.0: Bound to PCI domain 10000 > [ 1.569083] nvme nvme0: pci function 10000:e1:00.0 > [ 1.571421] pcieport 10000:e0:06.0: can't derive routing for PCI INT A > [ 1.573997] nvme 10000:e1:00.0: PCI INT A: not connected > [ 1.579028] pcieport 10000:e0:06.0: AER: Corrected error received: 10000:e1:00.0 > [ 1.584839] nvme 10000:e1:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver) > [ 1.587454] nvme 10000:e1:00.0: device [144d:a80a] error status/mask=00000001/0000e000 > [ 1.589502] nvme 10000:e1:00.0: [ 0] RxErr > [ 1.589813] nvme nvme0: Shutdown timeout set to 10 seconds > [ 1.591509] pcieport 10000:e0:06.0: AER: Corrected error received: 10000:e1:00.0 > [ 1.595252] pcieport 10000:e0:06.0: AER: can't find device of IDe100 > [ 1.597213] pcieport 10000:e0:06.0: AER: Corrected error received: 10000:e1:00.0 > ... Just for testing purposes, does it still produce the repeated error messages if you disable VMD?