On Thu, Feb 08, 2024 at 06:30:02PM -0600, Bjorn Helgaas wrote: > [+cc Pierre, author of 35ff867b7657 ("PCI/IOV: Serialize sysfs > sriov_numvfs reads vs writes")] > > On Wed, Dec 20, 2023 at 10:58:12PM +0000, Jim Harris wrote: > > If SR-IOV enabled device is held by vfio, and device is removed, > > vfio will hold device lock and notify userspace of the removal. If > > userspace reads sriov_numvfs sysfs entry, that thread will be > > blocked since sriov_numvfs_show() also tries to acquire the device > > lock. If that same thread is responsible for releasing the device to > > vfio, it results in a deadlock. > > > > One patch was proposed to add a separate mutex, specifically for > > struct pci_sriov, to synchronize access to sriov_numvfs in the sysfs > > paths (replacing use of the device_lock()). Leon instead suggested > > just reverting the commit 35ff867b765 which introduced device_lock() > > in the store path. This also led to a small fix around ordering on > > the kobject_uevent() when sriov_numvfs is updated. > > > > Ref: https://lore.kernel.org/linux-pci/ZXJI5+f8bUelVXqu@ubuntu/ > > 1) Cc author of the commit being reverted (Pierre) so he has a chance > to chime in and make sure the proposed fix works for him as well. Ack. I'll also Cc Pierre on the v2. > 2) The revert commit log needs to justify the revert, not merely say > what the proper way is. The Ref: above suggests that the current code > (pre-revert) leads to a deadlock in some cases, so the revert commit > log should detail that. > > It's ideal if we never regress, not even between the revert and the > second patch, so it's possible that they should be squashed into a > single patch. But if you keep it as two patches, it's trivial for me > to squash them if we decide that's best. The deadlock I hit is fixed by patch 1 alone. Patch 2 is a separate bug - it's better to update the num_VFs value before sending the notification that the num_VFs value changed. I'll add some more color to that commit message too, to differentiate it from the revert. I have no issues if you eventually decide to squash them. > > 3) Follow subject line convention for drivers/pci (use "git log > --oneline drivers/pci" to learn it). Will fix in v2. Thanks, Jim