On Thu, Oct 15, 2020 at 1:06 AM Kuppuswamy, Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx> wrote: > > > > On 10/14/20 8:07 AM, Ethan Zhao wrote: > > On Wed, Oct 14, 2020 at 5:00 PM Kuppuswamy Sathyanarayanan > > <sathyanarayanan.nkuppuswamy@xxxxxxxxx> wrote: > >> > >> Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") > >> merged fatal and non-fatal error recovery paths, and also made > >> recovery code depend on hotplug handler for "remove affected > >> device + rescan" support. But this change also complicated the > >> error recovery path and which in turn led to the following > >> issues. > >> > >> 1. We depend on hotplug handler for removing the affected > >> devices/drivers on DLLSC LINK down event (on DPC event > >> trigger) and DPC handler for handling the error recovery. Since > >> both handlers operate on same set of affected devices, it leads > >> to race condition, which in turn leads to NULL pointer > >> exceptions or error recovery failures.You can find more details > >> about this issue in following link. > >> > >> https://lore.kernel.org/linux-pci/20201007113158.48933-1-haifeng.zhao@xxxxxxxxx/T/#t > >> > >> 2. For non-hotplug capable devices fatal (DPC) error recovery > >> is currently broken. Current fatal error recovery implementation > >> relies on PCIe hotplug (pciehp) handler for detaching and > >> re-enumerating the affected devices/drivers. So when dealing with > >> non-hotplug capable devices, recovery code does not restore the state > >> of the affected devices correctly. You can find more details about > >> this issue in the following links. > >> > >> https://lore.kernel.org/linux-pci/20200527083130.4137-1-Zhiqiang.Hou@xxxxxxx/ > >> https://lore.kernel.org/linux-pci/12115.1588207324@famine/ > >> https://lore.kernel.org/linux-pci/0e6f89cd6b9e4a72293cc90fafe93487d7c2d295.1585000084.git.sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx/ > >> > >> In order to fix the above two issues, we should stop relying on hotplug > > Yes, it doesn't rely on hotplug handler to remove and rescan the device, > > but it couldn't prevent hotplug drivers from doing another replicated > > removal/rescanning. > > it doesn't make sense to leave another useless removal/rescanning there. > > Maybe that's why these two paths were merged to one and made it rely on > > hotplug. > No, as per PCIe spec, hotplug and DPC has no functional dependency. Hence > depending on it to handle some of its recovery function is in-correct and > would lead to issues in non-hotplug capable platforms (which is true > currently). > > > > >> + else > >> + udev = dev->bus->self; > >> + > >> + parent = udev->subordinate; > >> + pci_walk_bus(parent, pci_dev_set_disconnected, NULL); > >> + > >> + pci_lock_rescan_remove(); > > Though here you have lock, but hotplug will do another > > 'pci_stop_and_remove_bus_device()' > > without merging it with the hotplug driver, you have no way to > > remove the replicated actions in > > hotplug handler. > No, the core operation (remove/add device) is syncronzied and done in > only one thread. Please check the following flow. Even in hotplug pci_lock_rescan_remove() is global lock for PCIe, the mal-functional device's port holds this lock, it prevents the whole system from doing hot-plug operation. Though pciehp is not so hot/scalable and performance critical, but there is per cpu thread to handle hot-plug operation. synchronize all threads make them walk backwards for scalability. > handler, before removing the device, it attempts to hold pci_lock_rescan_remove() > lock. So holding the same lock in DPC handler will syncronize the DPC/hotplug > handlers. Also if one of the thread (DPC or hotplug) removes/adds the affected devices, > other thread will not repeat the same action (since the device is already removed/added). > > ->pciehp_ist() > ->pciehp_handle_presence_or_link_change() > ->pciehp_disable_slot() > ->__pciehp_disable_slot() > ->remove_board() > ->pciehp_unconfigure_device() > ->pci_lock_rescan_remove() > > > > > > Thanks, > > Ethan > >> + pci_dev_get(dev); > >> + list_for_each_entry_safe_reverse(pdev, temp, &parent->devices, > >> + bus_list) { > >> + pci_stop_and_remove_bus_device(pdev); > >> + } > >> + > >> + result = reset_link(udev); > >> + > >> + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { > >> + /* > >> + * If the error is reported by a bridge, we think this error > >> + * is related to the downstream link of the bridge, so we > >> + * do error recovery on all subordinates of the bridge instead > >> + * of the bridge and clear the error status of the bridge. > >> + */ > >> + pci_aer_clear_fatal_status(dev); > >> + if (pcie_aer_is_native(dev)) > >> + pcie_clear_device_status(dev); > >> + } > >> + > >> + if (result == PCI_ERS_RESULT_RECOVERED) { > >> + if (pcie_wait_for_link(udev, true)) > > And another pci_rescan_bus() like in the hotplug handler. > As I have mentioned before, holding the same lock should make them synchronized > and not repeat the underlying functionality of pci_rescan_bus() in both threads > at the same time. Yes, it blocked them all. Thanks, Ethan > >> + pci_rescan_bus(udev->bus); > >> + pci_info(dev, "Device recovery from fatal error successful\n"); > >> + } else { > >> + pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); > >> + pci_info(dev, "Device recovery from fatal error failed\n"); > > >> -- > >> 2.17.1 > >> > > -- > Sathyanarayanan Kuppuswamy > Linux Kernel Developer