On Fri, Jan 14, 2022 at 05:42:48PM +0800, Yicong Yang wrote: > On 2022/1/14 0:45, Lukasz Maniak wrote: > > On Wed, Jan 12, 2022 at 08:49:03AM -0600, Bjorn Helgaas wrote: > >> On Wed, Dec 22, 2021 at 08:19:57PM +0100, Lukasz Maniak wrote: > >>> As per PCI Express specification, FLR to a PF resets the PF state as > >>> well as the SR-IOV extended capability including VF Enable which means > >>> that VFs no longer exist. > >> > >> Can you add a specific reference to the spec, please? > >> > > Following the Single Root I/O Virtualization and Sharing Specification: > > 2.2.3. FLR That Targets a PF > > PFs must support FLR. > > FLR to a PF resets the PF state as well as the SR-IOV extended > > capability including VF Enable which means that VFs no longer exist. > > > > For PCI Express Base Specification Revision 5.0 and later, this is > > section 9.2.2.3. This is also the section in the new PCIe r6.0. Let's use that. > >>> Currently, the IOV state is not updated during FLR, resulting in > >>> non-compliant PCI driver behavior. > >> > >> And include a little detail about what problem is observed? How would > >> a user know this problem is occurring? > >> > > The problem is that the state of the kernel and HW as to the number of > > VFs gets out of sync after FLR. > > > > This results in further listing, after the FLR is performed by the HW, > > of VFs that actually no longer exist and should no longer be reported on > > the PCI bus. lspci return FFs for these VFs. > > There're some exceptions. Take HiSilicon's hns3 and sec device as an > example, the VF won't be destroyed after the FLR reset. If FLR on an hns3 PF does *not* clear VF Enable, and the VFs still exist after FLR, isn't that a violation of sec 9.2.2.3? If hns3 and sec don't conform to the spec, we should have some sort of quirk that serves to document and work around this. > Currently the transactions with the VF will be restored after the > FLR. But this patch will break that, the VF is fully disabled and > the transaction cannot be restored. User needs to reconfigure it, > which is unnecessary before this patch. What does it mean for a "transaction to be restored"? Maybe you mean this patch removes the *VFs* via sriov_del_vfs(), and whoever initiated the FLR would need to re-enable VFs via pci_enable_sriov() or something similar? If FLR disables VFs, it seems like we should expect to have to re-enable them if we want them. > Can we handle this problem in another way? Maybe test the VF's > vendor device ID after the FLR reset to see whether it has really > gone or not? > > > sriov_numvfs in sysfs returns old invalid value and does not allow > > setting a new value before explicitly setting 0 in the first place. > > > >>> This patch introduces a simple function, called on the FLR path, that > >>> removes the virtual function devices from the PCI bus and their > >>> corresponding sysfs links with a final clear of the num_vfs value in IOV > >>> state. > >>> > >>> Signed-off-by: Lukasz Maniak <lukasz.maniak@xxxxxxxxxxxxxxx> > >>> --- > >>> drivers/pci/iov.c | 21 +++++++++++++++++++++ > >>> drivers/pci/pci.c | 2 ++ > >>> drivers/pci/pci.h | 4 ++++ > >>> 3 files changed, 27 insertions(+) > >>> > >>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c > >>> index 0267977c9f17..69ee321027b4 100644 > >>> --- a/drivers/pci/iov.c > >>> +++ b/drivers/pci/iov.c > >>> @@ -1013,6 +1013,27 @@ int pci_iov_bus_range(struct pci_bus *bus) > >>> return max ? max - bus->number : 0; > >>> } > >>> > >>> +/** > >>> + * pci_reset_iov_state - reset the state of the IOV capability > >>> + * @dev: the PCI device > >>> + */ > >>> +void pci_reset_iov_state(struct pci_dev *dev) > >>> +{ > >>> + struct pci_sriov *iov = dev->sriov; > >>> + > >>> + if (!dev->is_physfn) > >>> + return; > >>> + if (!iov->num_VFs) > >>> + return; > >>> + > >>> + sriov_del_vfs(dev); > >>> + > >>> + if (iov->link != dev->devfn) > >>> + sysfs_remove_link(&dev->dev.kobj, "dep_link"); > >>> + > >>> + iov->num_VFs = 0; > >>> +} > >>> + > >>> /** > >>> * pci_enable_sriov - enable the SR-IOV capability > >>> * @dev: the PCI device > >>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > >>> index 3d2fb394986a..535f19d37e8d 100644 > >>> --- a/drivers/pci/pci.c > >>> +++ b/drivers/pci/pci.c > >>> @@ -4694,6 +4694,8 @@ EXPORT_SYMBOL(pci_wait_for_pending_transaction); > >>> */ > >>> int pcie_flr(struct pci_dev *dev) > >>> { > >>> + pci_reset_iov_state(dev); > >>> + > >>> if (!pci_wait_for_pending_transaction(dev)) > >>> pci_err(dev, "timed out waiting for pending transaction; performing function level reset anyway\n"); > >>> > >>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > >>> index 3d60cabde1a1..7bb144fbec76 100644 > >>> --- a/drivers/pci/pci.h > >>> +++ b/drivers/pci/pci.h > >>> @@ -480,6 +480,7 @@ void pci_iov_update_resource(struct pci_dev *dev, int resno); > >>> resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno); > >>> void pci_restore_iov_state(struct pci_dev *dev); > >>> int pci_iov_bus_range(struct pci_bus *bus); > >>> +void pci_reset_iov_state(struct pci_dev *dev); > >>> extern const struct attribute_group sriov_pf_dev_attr_group; > >>> extern const struct attribute_group sriov_vf_dev_attr_group; > >>> #else > >>> @@ -501,6 +502,9 @@ static inline int pci_iov_bus_range(struct pci_bus *bus) > >>> { > >>> return 0; > >>> } > >>> +static inline void pci_reset_iov_state(struct pci_dev *dev) > >>> +{ > >>> +} > >>> > >>> #endif /* CONFIG_PCI_IOV */ > >>> > >>> > >>> base-commit: fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf > >>> -- > >>> 2.25.1 > >>> > > . > >