On Fri, May 11, 2018 at 2:56 AM, Alex G. <mr.nuke.me@xxxxxxxxx> wrote: > > > On 05/10/2018 11:01 AM, Keith Busch wrote: >> AER handling expects a successful return from slot_reset means the >> driver made the device functional again. The nvme driver had been using >> an asynchronous reset to recover the device, so the device >> may still be initializing after control is returned to the >> AER handler. This creates problems for subsequent event handling, >> causing the initializion to fail. >> >> This patch fixes that by syncing the controller reset before returning >> to the AER driver, and reporting the true state of the reset. >> >> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199657 >> Reported-by: Alex Gagniuc <mr.nuke.me@xxxxxxxxx> > > Tested-by: Alex Gagniuc <mr.nuke.me@xxxxxxxxx> > > Sponsored-by: DellEMC > You know I had to add that plug somewhere :p > >> Cc: Sinan Kaya <okaya@xxxxxxxxxxxxxx> >> Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> >> Cc: <stable@xxxxxxxxxxxxxxx> >> Signed-off-by: Keith Busch <keith.busch@xxxxxxxxx> >> --- >> drivers/nvme/host/pci.c | 11 +++++++++-- >> 1 file changed, 9 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c >> index b542dce45927..2e221796257a 100644 >> --- a/drivers/nvme/host/pci.c >> +++ b/drivers/nvme/host/pci.c >> @@ -2681,8 +2681,15 @@ static pci_ers_result_t nvme_slot_reset(struct pci_dev *pdev) >> >> dev_info(dev->ctrl.device, "restart after slot reset\n"); >> pci_restore_state(pdev); >> - nvme_reset_ctrl(&dev->ctrl); >> - return PCI_ERS_RESULT_RECOVERED; >> + nvme_reset_ctrl_sync(&dev->ctrl); > > This does wonders when nvme_reset_ctrl_sync() returns in a timely > manner. I was also able to get the nvme drive in a state where > nvme_reset_ctrl_sync() does not return. Then we end up with the device > lock in report_slot_reset, which, as you may imagine, is not a great thing. > > I think this step is a move in the better direction, but we still have > problems. If IOs from nvme_reset_work() times out, nvme_reset_ctrl_sync() may never return, but not sure if that is your case. You may find where it hangs via 'ps -ax | grep D' and cat /proc/$PID/stack. -- Ming Lei