[+cc Alex, Lukas] On Fri, May 31, 2019 at 09:55:20AM -0700, JD Zheng wrote: > Hello, > > I am running DPDK 18.11+SPDK 19.04 with v5.1 kernel. DPDK/SPDK uses SSD vfio > devices and after running SPDK's nvmf_tgt, unplugging a SSD cause kernel to > print out following: > [ 105.426952] vfio-pci 0000:04:00.0: not ready 2047ms after FLR; waiting > [ 107.698953] vfio-pci 0000:04:00.0: not ready 4095ms after FLR; waiting > [ 112.050960] vfio-pci 0000:04:00.0: not ready 8191ms after FLR; waiting > [ 120.498953] vfio-pci 0000:04:00.0: not ready 16383ms after FLR; waiting > [ 138.418957] vfio-pci 0000:04:00.0: not ready 32767ms after FLR; waiting > [ 173.234953] vfio-pci 0000:04:00.0: not ready 65535ms after FLR; giving up > > Looks like it is a PCI hotplug racing condition between DPDK's > eal-intr-thread thread and kernel's pciehp thread. And it causes lockup in > pci_dev_wait() at kernel side. > > When SSD is removed, eal-intr-thread immediately receives > RTE_INTR_HANDLE_ALARM and handler calls rte_pci_detach_dev() and at kernel > side vfio_pci_release() is triggered to release this vfio device, which > calls pci_try_reset_function(), then _pci_reset_function_locked(). > pci_try_reset_function acquires the device lock but > _pci_reset_function_locked() doesn't return, therefore lock is NOT released. > > Inside _pci_reset_function_locked(), pcie_has_flr(), pci_pm_reset(), etc. > call pci_dev_wait() at the end but it doesn't return and print out above > message until 65s timeout. > > At kernel pciehp side, it also detects the removal but doesn't run > immediately as it is configured as "pciehp.pciehp_poll_time=5". So a couple > of seconds later, it calls pciehp_unconfigure_device -> pci_walk_bus -> > pci_dev_set_disconnected. pci_dev_set_disconnected() couldn't get the device > lock and is stuck too because the lock is hold by eal-intr-thread. > > The first issue is in pci_dev_wait(). It calls pci_read_config_dword() and > only when id is not all ones, it can return. But when SSD is physically > removed, id retrieved is always all ones therefore, it has to wait for FLR > 65s timeout to return. > > I did the following to check return value of pci_read_config_dword() to fix > this: > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -4439,7 +4439,11 @@ static int pci_dev_wait(struct pci_dev *dev, char > *reset_type, int timeout) > > msleep(delay); > delay *= 2; > - pci_read_config_dword(dev, PCI_COMMAND, &id); > + if (pci_read_config_dword(dev, PCI_COMMAND, &id) == > + PCIBIOS_DEVICE_NOT_FOUND) { > + pci_info(dev, "device disconnected\n"); > + return -ENODEV; > + } > } > > if (delay > 1000) > > The second issue is that due to lock up described above, the > pci_dev_set_disconnected() is stuck and pci_read_config_dword() won't return > PCIBIOS_DEVICE_NOT_FOUND. > > I didn't find a easy way to fix it. Maybe use device lock in > pci_dev_set_disconnected() is too coarse and we need a finer device > err_state lock? > > BTW, pci_dev_set_disconnected wasn't using device lock until this change > a6bd101b8f. > > Any suggestions to fix this problem? Would you mind opening a report at https://bugzilla.kernel.org and attaching the complete dmesg log and "lspci -vv" output? Out of curiosity, why do you use "pciehp.pciehp_poll_time=5"? Bjorn