On Wed, 2020-07-15 at 17:12 -0500, Bjorn Helgaas wrote: > > I've 'played' with PCIe error handling - without much success. > > What might be useful is for a driver that has just read ~0u to > > be able to ask 'has there been an error signalled for this device?'. > > In many cases a driver will know that ~0 is not a valid value for the > register it's reading. But if ~0 *could* be valid, an interface like > you suggest could be useful. I don't think we have anything like that > today, but maybe we could. It would certainly be nice if the PCI core > noticed, logged, and cleared errors. We have some of that for AER, > but that's an optional feature, and support for the error bits in the > garden-variety PCI_STATUS register is pretty haphazard. As you note > below, this sort of SERR/PERR reporting is frequently hard-wired in > ways that takes it out of our purview. We do have pci_channel_state (via pci_channel_offline()) which covers the cases where the underlying error handling (such as EEH or unplug) results in the device being offlined though this tend to be asynchronous so it might take a few ~0's before you get it. It's typically used to break potentially infinite loops in some drivers. There is no interface to check whether *an* error happened though for the most cases it will be captured in the status register, which is harvested (and cleared ?) by some EDAC drivers iirc... All this lacks coordination, I agree. Cheers, Ben.