On 2018-08-16 12:29, Benjamin Herrenschmidt wrote:
On Thu, 2018-08-16 at 16:51 +1000, Benjamin Herrenschmidt wrote:
> refering Sepc: 6.2.2.2.1 Fatal Errors, where link is unreliable and it
> might need AER style reset of link or DPC style HW recovery
> In both cases, the shutdown callbacks are expected to be called,
No, this is wrong and not the intent of the error handling.
You seem to be applying PCIe specific concepts brain-farted at Intel
that are way way away from what we care about in practice and in
Linux.
> e.g. some driver handle errors ERR_NONFATAL or FATAL in similar ways
> e.g.
> ioat_pcie_error_detected(); calls ioat_shutdown(); in case of
> ERR_NONFATAL
> otherwise ioat_shutdown() in case of ERR_FATAL.
Since when the error handling callbacks even have the concept of FATAL
vs. non-fatal ? This doesn't appear anyhwhere in the prototype of the
struct pci_error_handlers and shouldn't.
In fact looking at pcie_do_nonfatal_recovery() it's indeed completely
broken. It tells the driver that the slot was reset without actually
resetting anything... Ugh.
Ben.
pcie_do_nonfatal_recovery() exhibit the same behavior with or without
the patch-series.
in short, there was no functional change brought in to
pcie_do_nonfatal_recovery()
Regards,
Oza.