On Tue, Dec 29, 2015 at 09:58:22AM -0600, Bjorn Helgaas wrote: > On Fri, Dec 18, 2015 at 11:30:33AM +0100, David Henningsson wrote: > > Hi Linux PCI maintainers, > > > > My dmesg gets filled with a few lines repeated over and over again: > > > > pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0 > > pcieport 0000:00:1c.0: can't find device of ID00e0 > > pcieport 0000:00:1c.0: AER: Corrected error received: id=00e0 > > pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, > > type=Physical Layer, id=00e0(Receiver ID) > > pcieport 0000:00:1c.0: device [8086:9d14] error > > status/mask=00000001/00002000 > > pcieport 0000:00:1c.0: [ 0] Receiver Error > > > > This happens 10-30 times per second (!), so dmesg fills up quickly. > > The bug is present in both vanilla and Ubuntu kernels. > > This is a pretty obvious bug in our AER code. We normally clear > correctable errors by writing the PCI_ERR_COR_STATUS register in > handle_error_source(). The execution path looks like this: > > aer_isr_one_error > aer_print_port_info > if (find_source_device()) > aer_process_err_devices > handle_error_source > pci_write_config_dword(dev, PCI_ERR_COR_STATUS, ...) > > In this case, find_source_device() printed "can't find device of > ID00e0" [sic] and returned false, so we don't call > aer_process_err_devices(). The error is never cleared, so > we discover it again and again. > > I'll work on fixing this. Incidentally, there's another report > with similar symptoms here: > > https://bugzilla.kernel.org/show_bug.cgi?id=109691 I've thought about this problem a bit, but realistically I don't have time to do the fix I'd like to do, which would involve reading the AER status registers in the ISR and also *clearing* the error indication, also in the ISR. I think the current design, where we read bits of the status in various places, and clear it in yet other locations, is error-prone. Anybody else who is interested should feel free to take a crack at it. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html