Hi Bjorn, On 3/24/20 4:49 PM, Bjorn Helgaas wrote:
I don't understand why hotplug is relevant here. This path (dpc_reset_link()) is only used for downstream ports that support DPC. DPC has already disabled the link, which resets everything below the port, regardless of whether the port supports hotplug. I do see that PCI_ERS_RESULT_NEED_RESET seems to promise a lot more than it actually*does*. The doc (pci-error-recovery.rst) says .error_detected() can return PCI_ERS_RESULT_NEED_RESET to*request* a slot reset. But if that happens, pcie_do_recovery() doesn't do a reset at all. It calls the driver's .slot_reset() method, which tells the driver "we've reset your device; please re-initialize the hardware." I guess this abuses PCI_ERS_RESULT_NEED_RESET by taking advantage of that implementation deficiency in pcie_do_recovery(): we know the downstream devices have already been reset via DPC, and returning PCI_ERS_RESULT_NEED_RESET means we'll call .slot_reset() to tell the driver about that reset. I can see how this achieves the desired result, but if/when we fix pcie_do_recovery() to actually*do* the reset promised by PCI_ERS_RESULT_NEED_RESET, we will be doing*two* resets: the first via DPC and a second via whatever slot reset mechanism pcie_do_recovery() would use.
When we fix this issue, if we make sure the reset logic is implemented before we call .reset_link callback we should be able to avoid resetting the device twice. Before we call DPC .reset_link callback, the device link will not up and hence we should not able to reset it.
So I guess the real issue (as you allude to in the commit log) is that we rely on hotplug to unbind/rebind the driver, and without hotplug we need to at least tell the driver the device was reset.
Agree
I'll try to expand the comment here so it reminds me what's going on when we have to look at this again:) Let me know if I'm on the right track.
Yes, your understanding is correct.