On 05/10/2018 11:01 AM, Keith Busch wrote: > AER handling expects a successful return from slot_reset means the > driver made the device functional again. The nvme driver had been using > an asynchronous reset to recover the device, so the device > may still be initializing after control is returned to the > AER handler. This creates problems for subsequent event handling, > causing the initializion to fail. > > This patch fixes that by syncing the controller reset before returning > to the AER driver, and reporting the true state of the reset. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=199657 > Reported-by: Alex Gagniuc <mr.nuke.me@xxxxxxxxx> Tested-by: Alex Gagniuc <mr.nuke.me@xxxxxxxxx> Sponsored-by: DellEMC You know I had to add that plug somewhere :p > Cc: Sinan Kaya <okaya@xxxxxxxxxxxxxx> > Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> > Signed-off-by: Keith Busch <keith.busch@xxxxxxxxx> > --- > drivers/nvme/host/pci.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index b542dce45927..2e221796257a 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -2681,8 +2681,15 @@ static pci_ers_result_t nvme_slot_reset(struct pci_dev *pdev) > > dev_info(dev->ctrl.device, "restart after slot reset\n"); > pci_restore_state(pdev); > - nvme_reset_ctrl(&dev->ctrl); > - return PCI_ERS_RESULT_RECOVERED; > + nvme_reset_ctrl_sync(&dev->ctrl); This does wonders when nvme_reset_ctrl_sync() returns in a timely manner. I was also able to get the nvme drive in a state where nvme_reset_ctrl_sync() does not return. Then we end up with the device lock in report_slot_reset, which, as you may imagine, is not a great thing. I think this step is a move in the better direction, but we still have problems. Alex > + switch (dev->ctrl.state) { > + case NVME_CTRL_LIVE: > + case NVME_CTRL_ADMIN_ONLY: > + return PCI_ERS_RESULT_RECOVERED; > + default: > + return PCI_ERS_RESULT_DISCONNECT; > + } > } > > static void nvme_error_resume(struct pci_dev *pdev) >