On Wed, Feb 14, 2024 at 09:16:18AM +0200, Leon Romanovsky wrote: > On Tue, Feb 13, 2024 at 01:45:56PM -0600, Bjorn Helgaas wrote: > > On Tue, Feb 13, 2024 at 07:46:02PM +0200, Leon Romanovsky wrote: > > > On Tue, Feb 13, 2024 at 09:59:54AM -0600, Bjorn Helgaas wrote: > > > ... > > > > > > I guess that means that if we apply this revert, the problem Pierre > > > > reported will return. Obviously the deadlock is more important than > > > > the inconsistency Pierre observed, but from the user's point of view > > > > this will look like a regression. > > > > > > > > Maybe listening to netlink and then looking at sysfs isn't the > > > > "correct" way to do this, but I don't want to just casually break > > > > existing user code. If we do contemplate doing the revert, at the > > > > very least we should include specific details about what the user code > > > > *should* do instead, at the level of the actual commands to use > > > > instead of "ip monitor dev; cat ${path}/device/sriov_numvfs". > > > > > > udevadm monitor will do the trick. > > > > > > Another possible solution is to refactor the code to make sure that > > > .probe on VFs happens only after sriov_numvfs is updated. > > > > I like the idea of refactoring it so as to preserve the existing > > ordering while also fixing the deadlock. > > I think something like this will be enough (not tested). It will et the number of VFs > before we make VFs visible to probe: I'll push a v3, replacing the second patch with this one instead. Although based on this discussion it seems we're moving towards squashing the revert with Leon's suggested patch. Bjorn, I'll assume you're still OK with just squashing these on your end. I would like some input on how to actually test this though. Presumably we see some event on device PF and we want to make sure if we read PF/device/sriov_numvfs that we see the updated value. But the only type of event I think we can expect is the PF's sriov_numvfs CHANGE event. Is there any way for VFs to be created outside of writing to the sriov_numvfs sysfs file? My understanding is some older devices/drivers will auto-create VFs when the PF is initialized, but it wasn't clear from the bug report whether that was part of the configuration here. Pierre, do you have any recollection on this? Or maybe testing for this case just means compile and verify with udevadm monitor that we see the CHANGE event before any of the VFs are actually created... > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c > index aaa33e8dc4c9..0cdfaae80594 100644 > --- a/drivers/pci/iov.c > +++ b/drivers/pci/iov.c > @@ -679,12 +679,14 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) > msleep(100); > pci_cfg_access_unlock(dev); > > + iov->num_VFs = nr_virtfn; > rc = sriov_add_vfs(dev, initial); > - if (rc) > + if (rc) { > + iov->num_VFs = 0; > goto err_pcibios; > + } > > kobject_uevent(&dev->dev.kobj, KOBJ_CHANGE); > - iov->num_VFs = nr_virtfn; > > return 0; >