On 05/10/2018 02:14 PM, Keith Busch wrote: > On Thu, May 10, 2018 at 01:56:56PM -0500, Alex G. wrote: >>> @@ -2681,8 +2681,15 @@ static pci_ers_result_t nvme_slot_reset(struct pci_dev *pdev) >>> >>> dev_info(dev->ctrl.device, "restart after slot reset\n"); >>> pci_restore_state(pdev); >>> - nvme_reset_ctrl(&dev->ctrl); >>> - return PCI_ERS_RESULT_RECOVERED; >>> + nvme_reset_ctrl_sync(&dev->ctrl); >> >> This does wonders when nvme_reset_ctrl_sync() returns in a timely >> manner. I was also able to get the nvme drive in a state where >> nvme_reset_ctrl_sync() does not return. Then we end up with the device >> lock in report_slot_reset, which, as you may imagine, is not a great thing. > > It never returns? That shouldn't happen. There are cases where it may take > a very long time, depending on what the controller reports in CAP.TO. The > only other case it may stall is if the controller never responds to the > initialization admin commands, but that should delay by 60 seconds under > default parameters. Took 28 minutes before I gave up and rebooted the machine. Maybe I should have waited 30. Even 60 seconds seems like a terribly long time to wait in AER. Simple stuff like block IO and 'nvme list' hangs in kernel space this entire time. I can raise a separate issue once I find a reliable way to repro. Alex