On 9/22/20 4:33 PM, Bjorn Helgaas wrote:
On Tue, Sep 22, 2020 at 02:44:51PM -0700, Kuppuswamy, Sathyanarayanan wrote:
On 9/22/20 11:52 AM, Bjorn Helgaas wrote:
On Fri, Jul 24, 2020 at 12:07:55PM -0700, sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx wrote:
From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
Current pcie_do_recovery() implementation has following two issues:
I'm having trouble parsing this out, probably just lack of my
understanding...
1. Fatal (DPC) error recovery is currently broken for non-hotplug
capable devices. Current fatal error recovery implementation relies
on PCIe hotplug (pciehp) handler for detaching and re-enumerating
the affected devices/drivers. pciehp handler listens for DLLSC state
changes and handles device/driver detachment on DLLSC_LINK_DOWN event
and re-enumeration on DLLSC_LINK_UP event. So when dealing with
non-hotplug capable devices, recovery code does not restore the state
of the affected devices correctly.
Apparently in the hotplug case, something *does* restore the state of
affected devices?
Yes, in hotplug case, DLLSC state change handler takes over detachment
/cleanup and re-attachment of affected devices/drivers.
Where does the restore happen here? I.e., what function does this?
DLLSC link down event will remove affected devices/drivers. And link up event
will re-create all devices.
on DLLSC link down event
->pciehp_ist()
->pciehp_handle_presence_or_link_change()
->pciehp_disable_slot()
->__pciehp_disable_slot()
->remove_board()
->pciehp_unconfigure_device()
on DLLSC link up event
->pciehp_ist()
->pciehp_handle_presence_or_link_change()
->pciehp_enable_slot()
->__pciehp_enable_slot()
->board_added()
->pciehp_configure_device()
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer