On Tue, Mar 04, 2025 at 05:04:21PM -0800, Jon Pan-Doh wrote: > On Tue, Mar 4, 2025 at 10:32 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > It's true this is redundant information, but that e1000e device may > > no longer be accessible. > > > > In that case, I think aer_get_device_error_info() would probably > > return 0 because config reads would all return ~0, and > > PCI_ERR_COR_STATUS & ~PCI_ERR_COR_MASK would be 0, so > > we probably wouldn't see the e1000e messages at all. > > Wouldn't we have larger issues if the device is no longer accessible? > Would a log suffice in that case (i.e. when aer_get_device_error() > returns 0)? Something along the lines of "{device} is not accessible > while processing (un)correctable error" It's quite likely that a device is inaccessible after an uncorrectable error. DPC takes the link down automatically for uncorrectable errors, but I don't think aer_print_port_info() is used in that case anyway. Documentation/PCI/pci-error-recovery.rst mentions other cases where the affected device is disconnected. If the purpose of this patch is only to turn this: pcieport 0000:00:04.0: Correctable error message received from 0000:01:00.0 e1000e 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID) e1000e 0000:01:00.0: device [8086:10d3] error status/mask=00000040/0000e000 e1000e 0000:01:00.0: [ 6] BadTLP into this: e1000e 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID) e1000e 0000:01:00.0: device [8086:10d3] error status/mask=00000040/0000e000 e1000e 0000:01:00.0: [ 6] BadTLP I don't think it's worth it. I guess the problem is that future patches rate limit the e1000e messages, and we really need to rate limit the pcieport message using the same e1000e ratelimit_state. We do know the Requester ID of the device, so maybe we could look up that ratelimit_state?