On 6/2/19 6:40 PM, Alex Williamson wrote:
On Sun, 2 Jun 2019 19:44:14 -0500
Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
[+cc Alex, Lukas]
On Fri, May 31, 2019 at 09:55:20AM -0700, JD Zheng wrote:
Hello,
I am running DPDK 18.11+SPDK 19.04 with v5.1 kernel. DPDK/SPDK uses SSD vfio
devices and after running SPDK's nvmf_tgt, unplugging a SSD cause kernel to
print out following:
[ 105.426952] vfio-pci 0000:04:00.0: not ready 2047ms after FLR; waiting
[ 107.698953] vfio-pci 0000:04:00.0: not ready 4095ms after FLR; waiting
[ 112.050960] vfio-pci 0000:04:00.0: not ready 8191ms after FLR; waiting
[ 120.498953] vfio-pci 0000:04:00.0: not ready 16383ms after FLR; waiting
[ 138.418957] vfio-pci 0000:04:00.0: not ready 32767ms after FLR; waiting
[ 173.234953] vfio-pci 0000:04:00.0: not ready 65535ms after FLR; giving up
Looks like it is a PCI hotplug racing condition between DPDK's
eal-intr-thread thread and kernel's pciehp thread. And it causes lockup in
pci_dev_wait() at kernel side.
When SSD is removed, eal-intr-thread immediately receives
RTE_INTR_HANDLE_ALARM and handler calls rte_pci_detach_dev() and at kernel
side vfio_pci_release() is triggered to release this vfio device, which
calls pci_try_reset_function(), then _pci_reset_function_locked().
pci_try_reset_function acquires the device lock but
_pci_reset_function_locked() doesn't return, therefore lock is NOT released.
To what extent does vfio-pci need to learn about surprise hotplug? My
expectation is that the current state of the code would only support
cooperative hotplug. When a device is surprise removed, what backs a
user's mmaps? AIUI, we don't have a revoke interface to invalidate
these. We should probably start with an RFE or some development effort
to harden vfio-pci for surprise hotplug, it's not surprising it doesn't
just work TBH. Thanks,
Alex
I did see other issues that DPDK unmap vifo device memory was stuck due
to surprise removal.
Are you saying that vfio-pci surprise removal is not fully implemented yet?
Inside _pci_reset_function_locked(), pcie_has_flr(), pci_pm_reset(), etc.
call pci_dev_wait() at the end but it doesn't return and print out above
message until 65s timeout.
At kernel pciehp side, it also detects the removal but doesn't run
immediately as it is configured as "pciehp.pciehp_poll_time=5". So a couple
of seconds later, it calls pciehp_unconfigure_device -> pci_walk_bus ->
pci_dev_set_disconnected. pci_dev_set_disconnected() couldn't get the device
lock and is stuck too because the lock is hold by eal-intr-thread.
The first issue is in pci_dev_wait(). It calls pci_read_config_dword() and
only when id is not all ones, it can return. But when SSD is physically
removed, id retrieved is always all ones therefore, it has to wait for FLR
65s timeout to return.
I did the following to check return value of pci_read_config_dword() to fix
this:
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4439,7 +4439,11 @@ static int pci_dev_wait(struct pci_dev *dev, char
*reset_type, int timeout)
msleep(delay);
delay *= 2;
- pci_read_config_dword(dev, PCI_COMMAND, &id);
+ if (pci_read_config_dword(dev, PCI_COMMAND, &id) ==
+ PCIBIOS_DEVICE_NOT_FOUND) {
+ pci_info(dev, "device disconnected\n");
+ return -ENODEV;
+ }
}
if (delay > 1000)
The second issue is that due to lock up described above, the
pci_dev_set_disconnected() is stuck and pci_read_config_dword() won't return
PCIBIOS_DEVICE_NOT_FOUND.
I didn't find a easy way to fix it. Maybe use device lock in
pci_dev_set_disconnected() is too coarse and we need a finer device
err_state lock?
BTW, pci_dev_set_disconnected wasn't using device lock until this change
a6bd101b8f.
Any suggestions to fix this problem?
Would you mind opening a report at https://bugzilla.kernel.org and
attaching the complete dmesg log and "lspci -vv" output?
I submitted one as https://bugzilla.kernel.org/show_bug.cgi?id=203797
Out of curiosity, why do you use "pciehp.pciehp_poll_time=5"?
This was chosen by other engineer. I tried shorter polling time, which
doesn't make difference.
I also tried pciehp.pciehp_poll_mode=0 but hotplug doesn't seem working.
Should I use poll mode or not?
Bjorn