On Sun, Feb 11, 2024 at 11:15:44AM -0800, Kuppuswamy Sathyanarayanan wrote: > > On 2/11/24 12:48 AM, Leon Romanovsky wrote: > > On Fri, Feb 09, 2024 at 07:20:28PM -0800, Kuppuswamy Sathyanarayanan wrote: > >> On 2/9/24 3:52 PM, Jim Harris wrote: > >>> If an SR-IOV enabled device is held by vfio, and the device is removed, > >>> vfio will hold device lock and notify userspace of the removal. If > >>> userspace reads the sriov_numvfs sysfs entry, that thread will be blocked > >>> since sriov_numvfs_show() also tries to acquire the device lock. If that > >>> same thread is responsible for releasing the device to vfio, it results in > >>> a deadlock. > >>> > >>> The proper way to detect a change to the num_VFs value is to listen for a > >>> sysfs event, not to add a device_lock() on the attribute _show() in the > >>> kernel. > >> Since you are reverting a commit that synchronizes SysFS read > >> /write, please add some comments about why it is not an > >> issue anymore. > > It was never an issue, the idea that sysfs read and write should be serialized by kernel > > is not correct by definition. > > What: /sys/bus/pci/devices/.../sriov_numvfs > Date: November 2012 > Contact: Donald Dutile <ddutile@xxxxxxxxxx> > Description: > This file appears when a physical PCIe device supports SR-IOV. > Userspace applications can read and write to this file to > determine and control the enablement or disablement of Virtual > Functions (VFs) on the physical function (PF). A read of this > file will return the number of VFs that are enabled on this PF. > > I am not very clear about the user of this SysFs. But, as per above description, > this sysfs seems to controls the number of VFs. A typical usage is to allow user > to write a value and then read to check the enabled/disabled number of VMs, > right? No, typical usage is to write a value and observe spawned VFs. The read from sysfs is allowed for completeness and performed in sequence with write. Any attempt to read and write in parallel is prone to races by definition as they are not controlled by the kernel. > > If you are not synchronizing, then the value returned may not reflect the actual > number of enabled / disabled VFs. So wont this change affect the existing user > of this SysFS. No, it won't. Users were supposed to synchronize their read and write before calling sysfs. If they didn't do it, they already have broken code. Please read original discussion pointed by Jim. Thanks > > > > > Thanks > > > >>> This reverts commit 35ff867b76576e32f34c698ccd11343f7d616204. > >>> Revert had a small conflict, the sprintf() is now changed to sysfs_emit(). > >>> > >>> Link: https://lore.kernel.org/linux-pci/ZXJI5+f8bUelVXqu@ubuntu/ > >>> Suggested-by: Leon Romanovsky <leonro@xxxxxxxxxx> > >>> Reviewed-by: Leon Romanovsky <leonro@xxxxxxxxxx> > >>> Signed-off-by: Jim Harris <jim.harris@xxxxxxxxxxx> > >>> --- > >>> drivers/pci/iov.c | 8 +------- > >>> 1 file changed, 1 insertion(+), 7 deletions(-) > >>> > >>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c > >>> index aaa33e8dc4c9..0ca20cd518d5 100644 > >>> --- a/drivers/pci/iov.c > >>> +++ b/drivers/pci/iov.c > >>> @@ -395,14 +395,8 @@ static ssize_t sriov_numvfs_show(struct device *dev, > >>> char *buf) > >>> { > >>> struct pci_dev *pdev = to_pci_dev(dev); > >>> - u16 num_vfs; > >>> - > >>> - /* Serialize vs sriov_numvfs_store() so readers see valid num_VFs */ > >>> - device_lock(&pdev->dev); > >>> - num_vfs = pdev->sriov->num_VFs; > >>> - device_unlock(&pdev->dev); > >>> > >>> - return sysfs_emit(buf, "%u\n", num_vfs); > >>> + return sysfs_emit(buf, "%u\n", pdev->sriov->num_VFs); > >>> } > >>> > >>> /* > >>> > >> -- > >> Sathyanarayanan Kuppuswamy > >> Linux Kernel Developer > >> > -- > Sathyanarayanan Kuppuswamy > Linux Kernel Developer > >