Re: Locking between vfio hot-remove and pci sysfs sriov_numvfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 7 Dec 2023 22:38:23 +0000
Jim Harris <jim.harris@xxxxxxxxxxx> wrote:

> I am seeing a deadlock using SPDK with hotplug detection using vfio-pci
> and an SR-IOV enabled NVMe SSD. It is not clear if this deadlock is intended
> or if it's a kernel bug.
> 
> Note: SPDK uses DPDK's PCI device enumeration framework, so I'll reference
> both SPDK and DPDK in this description.
> 
> DPDK registers an eventfd with vfio for hotplug notifications. If the associated
> device is removed (i.e. write 1 to its pci sysfs remove entry), vfio
> writes to the eventfd, requesting DPDK to release the device. It does this
> while holding the device_lock(), and then waits for completion.
> 
> DPDK gets the notification, and passes it up to SPDK. SPDK does not release
> the device immediately. It has some asynchronous operations that need to be
> performed first, so it will release the device a bit later.
> 
> But before the device is released, SPDK also triggers DPDK to do a sysfs scan
> looking for newly inserted devices. Note that the removed device is not
> completely removed yet from kernel PCI perspective - all of its sysfs entries
> are still available, including sriov_numvfs.
> 
> DPDK explicitly reads sriov_numvfs to see if the device is SR-IOV capable.
> SPDK itself doesn't actually use this value, but it is part of the scan
> triggered by SPDK and directly leads to the deadlock. sriov_numvfs_show()
> deadlocks because it tries to hold device_lock() while reading the pci
> device's pdev->sriov->num_VFs.
> 
> We're able to workaround this in SPDK by deferring the sysfs scan if
> a device removal is in process. And maybe that is what we are supposed to
> be doing, to avoid this deadlock?
> 
> Reference to SPDK issue, for some more details (plus simple repro stpes for
> anyone already familiar with SPDK): https://github.com/spdk/spdk/issues/3205

device_lock() has been a recurring problem.  We don't have a lot of
leeway in how we support the driver remove callback, the device needs
to be released.  We can't return -EBUSY and I don't think we can drop
the mutex while we're waiting on userspace.

I've done some fix-ups in the past to use device_trylock() to avoid
deadlocks, which might be an option here, ex. reading sriov_numvfs
could return -EBUSY in this scenario.  We keep running into these
scenarios though and we might just need to pick a point at which we
kill the user process holding the device.

I'm open to suggestions.  Thanks,

Alex





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux