On Fri, Feb 09, 2024 at 10:32:16AM +0530, Nilay Shroff wrote: > @@ -2776,6 +2776,14 @@ static void nvme_reset_work(struct work_struct *work) > out_unlock: > mutex_unlock(&dev->shutdown_lock); > out: > + /* > + * If PCI recovery is ongoing then let it finish first > + */ > + if (pci_channel_offline(to_pci_dev(dev->dev))) { > + dev_warn(dev->ctrl.device, "PCI recovery is ongoing so let it finish\n"); > + return; > + } > + > /* > * Set state to deleting now to avoid blocking nvme_wait_reset(), which > * may be holding this pci_dev's device lock. > @@ -3295,9 +3303,11 @@ static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev, > case pci_channel_io_frozen: > dev_warn(dev->ctrl.device, > "frozen state error detected, reset controller\n"); > - if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) { > - nvme_dev_disable(dev, true); > - return PCI_ERS_RESULT_DISCONNECT; > + if (nvme_ctrl_state(&dev->ctrl) != NVME_CTRL_RESETTING) { > + if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) { > + nvme_dev_disable(dev, true); > + return PCI_ERS_RESULT_DISCONNECT; > + } > } > nvme_dev_disable(dev, false); > return PCI_ERS_RESULT_NEED_RESET; I get what you're trying to do, but it looks racy. The reset_work may finish before pci sets channel offline, or the error handling work happens to see RESETTING state, but then transitions to CONNECTING state after and deadlocks on the '.resume()' side. You are counting on a very specific sequence tied to the PCIe error handling module, and maybe you are able to count on that sequence for your platform in this unique scenario, but these link errors could happen anytime. And nvme subsystem reset is just odd, it's not clear how it was intended to be handled. It takes the links down so seems like it requires re-enumeration from a pcie hotplug driver, and that's kind of how it was expected to work here, but your platform has a special way to contain the link event and bring things back up the way they were before. And the fact you *require* IO to be in flight just so the timeout handler can dispatch a non-posted transaction 30 seconds later to trigger EEH is also odd. Why can't EEH just detect the link down event directly? This driver unfortunately doesn't handle errors during a reset well. Trying to handle that has been problematic, so the driver just bails if anything goes wrong at this critical initialization point. Maybe we need to address the reset/initialization failure handling more generically and delegate the teardown or retry decision to something else. Messing with that is pretty fragile right now, though. Or you could just re-enumerate the slot. I don't know, sorry my message is not really helping much to get this fixed.