On Thu, 2018-08-16 at 16:51 +1000, Benjamin Herrenschmidt wrote: > > refering Sepc: 6.2.2.2.1 Fatal Errors, where link is unreliable and it > > might need AER style reset of link or DPC style HW recovery > > In both cases, the shutdown callbacks are expected to be called, > > No, this is wrong and not the intent of the error handling. > > You seem to be applying PCIe specific concepts brain-farted at Intel > that are way way away from what we care about in practice and in Linux. > > > e.g. some driver handle errors ERR_NONFATAL or FATAL in similar ways > > e.g. > > ioat_pcie_error_detected(); calls ioat_shutdown(); in case of > > ERR_NONFATAL > > otherwise ioat_shutdown() in case of ERR_FATAL. > > Since when the error handling callbacks even have the concept of FATAL > vs. non-fatal ? This doesn't appear anyhwhere in the prototype of the > struct pci_error_handlers and shouldn't. In fact looking at pcie_do_nonfatal_recovery() it's indeed completely broken. It tells the driver that the slot was reset without actually resetting anything... Ugh. Ben.