Re: SSD surprise removal leads to long wait inside pci_dev_wait() and FLR 65s timeout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2 Jun 2019 19:44:14 -0500
Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:

> [+cc Alex, Lukas]
> 
> On Fri, May 31, 2019 at 09:55:20AM -0700, JD Zheng wrote:
> > Hello,
> > 
> > I am running DPDK 18.11+SPDK 19.04 with v5.1 kernel. DPDK/SPDK uses SSD vfio
> > devices and after running SPDK's nvmf_tgt, unplugging a SSD cause kernel to
> > print out following:
> > [  105.426952] vfio-pci 0000:04:00.0: not ready 2047ms after FLR; waiting
> > [  107.698953] vfio-pci 0000:04:00.0: not ready 4095ms after FLR; waiting
> > [  112.050960] vfio-pci 0000:04:00.0: not ready 8191ms after FLR; waiting
> > [  120.498953] vfio-pci 0000:04:00.0: not ready 16383ms after FLR; waiting
> > [  138.418957] vfio-pci 0000:04:00.0: not ready 32767ms after FLR; waiting
> > [  173.234953] vfio-pci 0000:04:00.0: not ready 65535ms after FLR; giving up
> > 
> > Looks like it is a PCI hotplug racing condition between DPDK's
> > eal-intr-thread thread and kernel's pciehp thread. And it causes lockup in
> > pci_dev_wait() at kernel side.
> > 
> > When SSD is removed, eal-intr-thread immediately receives
> > RTE_INTR_HANDLE_ALARM and handler calls rte_pci_detach_dev() and at kernel
> > side vfio_pci_release() is triggered to release this vfio device, which
> > calls pci_try_reset_function(), then _pci_reset_function_locked().
> > pci_try_reset_function acquires the device lock but
> > _pci_reset_function_locked() doesn't return, therefore lock is NOT released.

To what extent does vfio-pci need to learn about surprise hotplug?  My
expectation is that the current state of the code would only support
cooperative hotplug.  When a device is surprise removed, what backs a
user's mmaps?  AIUI, we don't have a revoke interface to invalidate
these.  We should probably start with an RFE or some development effort
to harden vfio-pci for surprise hotplug, it's not surprising it doesn't
just work TBH.  Thanks,

Alex


> > Inside _pci_reset_function_locked(), pcie_has_flr(), pci_pm_reset(), etc.
> > call pci_dev_wait() at the end but it doesn't return and print out above
> > message until 65s timeout.
> > 
> > At kernel pciehp side, it also detects the removal but doesn't run
> > immediately as it is configured as "pciehp.pciehp_poll_time=5". So a couple
> > of seconds later, it calls pciehp_unconfigure_device -> pci_walk_bus ->
> > pci_dev_set_disconnected. pci_dev_set_disconnected() couldn't get the device
> > lock and is stuck too because the lock is hold by eal-intr-thread.
> > 
> > The first issue is in pci_dev_wait(). It calls pci_read_config_dword() and
> > only when id is not all ones, it can return. But when SSD is physically
> > removed, id retrieved is always all ones therefore, it has to wait for FLR
> > 65s timeout to return.
> > 
> > I did the following to check return value of pci_read_config_dword() to fix
> > this:
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -4439,7 +4439,11 @@ static int pci_dev_wait(struct pci_dev *dev, char
> > *reset_type, int timeout)
> > 
> >                 msleep(delay);
> >                 delay *= 2;
> > -               pci_read_config_dword(dev, PCI_COMMAND, &id);
> > +               if (pci_read_config_dword(dev, PCI_COMMAND, &id) ==
> > +                   PCIBIOS_DEVICE_NOT_FOUND) {
> > +                       pci_info(dev, "device disconnected\n");
> > +                       return -ENODEV;
> > +               }
> >         }
> > 
> >         if (delay > 1000)
> > 
> > The second issue is that due to lock up described above, the
> > pci_dev_set_disconnected() is stuck and pci_read_config_dword() won't return
> > PCIBIOS_DEVICE_NOT_FOUND.
> > 
> > I didn't find a easy way to fix it. Maybe use device lock in
> > pci_dev_set_disconnected() is too coarse and we need a finer device
> > err_state lock?
> > 
> > BTW, pci_dev_set_disconnected wasn't using device lock until this change
> > a6bd101b8f.
> > 
> > Any suggestions to fix this problem?  
> 
> Would you mind opening a report at https://bugzilla.kernel.org and
> attaching the complete dmesg log and "lspci -vv" output?
> 
> Out of curiosity, why do you use "pciehp.pciehp_poll_time=5"?
> 
> Bjorn




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux