Re: Locking between vfio hot-remove and pci sysfs sriov_numvfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 07, 2023 at 07:48:10PM -0400, Jason Gunthorpe wrote:
> On Thu, Dec 07, 2023 at 04:21:48PM -0700, Alex Williamson wrote:
> > On Thu, 7 Dec 2023 22:38:23 +0000
> > Jim Harris <jim.harris@xxxxxxxxxxx> wrote:
> > 
> > device_lock() has been a recurring problem.  We don't have a lot of
> > leeway in how we support the driver remove callback, the device needs
> > to be released.  We can't return -EBUSY and I don't think we can drop
> > the mutex while we're waiting on userspace.
> 
> The mechanism of waiting in remove for userspace is inherently flawed,
> it can never work fully correctly. :( I've hit this many times.
> 
> Upon remove VFIO should immediately remove itself and leave behind a
> non-functional file descriptor. Userspace should catch up eventually
> and see it is toast.
> 
> The kernel locking model just cannot support userspace delaying this
> process.
> 
> Jason

Maybe for now we just whack this specific mole with a separate mutex
for synchronizing access to sriov->num_VFs in the sysfs paths?
Something like this (tested on my system):

---

Author: Jim Harris <jim.harris@xxxxxxxxxxx>

pci: sync sriov->num_VFs sysfs access with its own mutex

If SR-IOV enabled device is held by vfio, and device is removed, vfio will hold
device lock and notify userspace of the removal. If userspace reads sriov_numvfs
sysfs entry, that thread will be blocked since sriov_numvfs_show() also tries
to acquire the device lock. If that same thread is responsible for releasing the
device to vfio, it results in a deadlock.

So add a separate mutex, specifically for struct pci_sriov. Use this to
synchronize accesses to sriov_numvfs in the sysfs paths. sriov_numvfs_store()
will also still hold the device lock while configuring sriov.

Fixes: 35ff867b765 ("PCI/IOV: Serialize sysfs sriov_numvfs reads vs writes")

Signed-off-by: Jim Harris <jim.harris@xxxxxxxxxxx>

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 25dbe85c4217..8910cf6c97be 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -398,9 +398,9 @@ static ssize_t sriov_numvfs_show(struct device *dev,
 	u16 num_vfs;
 
 	/* Serialize vs sriov_numvfs_store() so readers see valid num_VFs */
-	device_lock(&pdev->dev);
+	mutex_lock(&pdev->sriov->lock);
 	num_vfs = pdev->sriov->num_VFs;
-	device_unlock(&pdev->dev);
+	mutex_unlock(&pdev->sriov->lock);
 
 	return sysfs_emit(buf, "%u\n", num_vfs);
 }
@@ -427,6 +427,7 @@ static ssize_t sriov_numvfs_store(struct device *dev,
 		return -ERANGE;
 
 	device_lock(&pdev->dev);
+	mutex_lock(&pdev->sriov->lock);
 
 	if (num_vfs == pdev->sriov->num_VFs)
 		goto exit;
@@ -468,6 +469,7 @@ static ssize_t sriov_numvfs_store(struct device *dev,
 			 num_vfs, ret);
 
 exit:
+	mutex_unlock(&pdev->sriov->lock);
 	device_unlock(&pdev->dev);
 
 	if (ret < 0)
@@ -808,6 +810,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
 		nres++;
 	}
 
+	mutex_init(&iov->lock);
 	iov->pos = pos;
 	iov->nres = nres;
 	iov->ctrl = ctrl;
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 5ecbcf041179..04e636ab50e5 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -313,6 +313,7 @@ struct pci_sriov {
 	u16		subsystem_device; /* VF subsystem device */
 	resource_size_t	barsz[PCI_SRIOV_NUM_BARS];	/* VF BAR size */
 	bool		drivers_autoprobe; /* Auto probing of VFs by driver */
+	struct mutex	lock; /* for synchronizing num_VFs sysfs accesses */
 };
 
 #ifdef CONFIG_PCI_DOE




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux