On Sun, 2 Jun 2019 19:44:14 -0500 Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > [+cc Alex, Lukas] > > On Fri, May 31, 2019 at 09:55:20AM -0700, JD Zheng wrote: > > Hello, > > > > I am running DPDK 18.11+SPDK 19.04 with v5.1 kernel. DPDK/SPDK uses SSD vfio > > devices and after running SPDK's nvmf_tgt, unplugging a SSD cause kernel to > > print out following: > > [ 105.426952] vfio-pci 0000:04:00.0: not ready 2047ms after FLR; waiting > > [ 107.698953] vfio-pci 0000:04:00.0: not ready 4095ms after FLR; waiting > > [ 112.050960] vfio-pci 0000:04:00.0: not ready 8191ms after FLR; waiting > > [ 120.498953] vfio-pci 0000:04:00.0: not ready 16383ms after FLR; waiting > > [ 138.418957] vfio-pci 0000:04:00.0: not ready 32767ms after FLR; waiting > > [ 173.234953] vfio-pci 0000:04:00.0: not ready 65535ms after FLR; giving up > > > > Looks like it is a PCI hotplug racing condition between DPDK's > > eal-intr-thread thread and kernel's pciehp thread. And it causes lockup in > > pci_dev_wait() at kernel side. > > > > When SSD is removed, eal-intr-thread immediately receives > > RTE_INTR_HANDLE_ALARM and handler calls rte_pci_detach_dev() and at kernel > > side vfio_pci_release() is triggered to release this vfio device, which > > calls pci_try_reset_function(), then _pci_reset_function_locked(). > > pci_try_reset_function acquires the device lock but > > _pci_reset_function_locked() doesn't return, therefore lock is NOT released. To what extent does vfio-pci need to learn about surprise hotplug? My expectation is that the current state of the code would only support cooperative hotplug. When a device is surprise removed, what backs a user's mmaps? AIUI, we don't have a revoke interface to invalidate these. We should probably start with an RFE or some development effort to harden vfio-pci for surprise hotplug, it's not surprising it doesn't just work TBH. Thanks, Alex > > Inside _pci_reset_function_locked(), pcie_has_flr(), pci_pm_reset(), etc. > > call pci_dev_wait() at the end but it doesn't return and print out above > > message until 65s timeout. > > > > At kernel pciehp side, it also detects the removal but doesn't run > > immediately as it is configured as "pciehp.pciehp_poll_time=5". So a couple > > of seconds later, it calls pciehp_unconfigure_device -> pci_walk_bus -> > > pci_dev_set_disconnected. pci_dev_set_disconnected() couldn't get the device > > lock and is stuck too because the lock is hold by eal-intr-thread. > > > > The first issue is in pci_dev_wait(). It calls pci_read_config_dword() and > > only when id is not all ones, it can return. But when SSD is physically > > removed, id retrieved is always all ones therefore, it has to wait for FLR > > 65s timeout to return. > > > > I did the following to check return value of pci_read_config_dword() to fix > > this: > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -4439,7 +4439,11 @@ static int pci_dev_wait(struct pci_dev *dev, char > > *reset_type, int timeout) > > > > msleep(delay); > > delay *= 2; > > - pci_read_config_dword(dev, PCI_COMMAND, &id); > > + if (pci_read_config_dword(dev, PCI_COMMAND, &id) == > > + PCIBIOS_DEVICE_NOT_FOUND) { > > + pci_info(dev, "device disconnected\n"); > > + return -ENODEV; > > + } > > } > > > > if (delay > 1000) > > > > The second issue is that due to lock up described above, the > > pci_dev_set_disconnected() is stuck and pci_read_config_dword() won't return > > PCIBIOS_DEVICE_NOT_FOUND. > > > > I didn't find a easy way to fix it. Maybe use device lock in > > pci_dev_set_disconnected() is too coarse and we need a finer device > > err_state lock? > > > > BTW, pci_dev_set_disconnected wasn't using device lock until this change > > a6bd101b8f. > > > > Any suggestions to fix this problem? > > Would you mind opening a report at https://bugzilla.kernel.org and > attaching the complete dmesg log and "lspci -vv" output? > > Out of curiosity, why do you use "pciehp.pciehp_poll_time=5"? > > Bjorn