On Tue, Mar 07, 2017 at 02:15:29PM -0600, Bjorn Helgaas wrote: >On Mon, Feb 27, 2017 at 11:18:32AM +1100, Gavin Shan wrote: >> The PowerNV platform is the only user of pcibios_sriov_disable(). >> The IOV BAR could be shifted by pci_iov_update_resource(). The >> warning message in the function is printed if the IOV capability >> is in enabled (PCI_SRIOV_CTRL_VFE && PCI_SRIOV_CTRL_MSE) state. >> >> pci_disable_sriov >> sriov_disable >> pnv_pci_sriov_disable >> pnv_pci_vf_resource_shift >> pci_update_resource >> pci_iov_update_resource >> >> This fixes the issue by disabling IOV capability before calling >> pcibios_sriov_disable(). With it, the disabling path matches with >> the enabling path: pcibios_sriov_enable() is called before the >> IOV capability is enabled. > >I'm vaguely uncomfortable about this path: > > pci_disable_sriov > sriov_disable > pcibios_sriov_disable # powerpc version > pnv_pci_sriov_disable > pnv_pci_vf_resource_shift > res = &dev->resource[i + PCI_IOV_RESOURCES] > res->start += size * offset > pci_update_resource > pci_iov_update_resource > pnv_pci_vf_release_m64 > >1) "res" is already in the resource tree, so we shouldn't be changing > its start address, because that may make the tree inconsistent, > e.g., the resource may no longer be completely contained in its > parent, it may conflict with a sibling, etc. > >2) If we update "res->start", shouldn't we update "res->end" > correspondingly? > >It seems like it'd be better if we didn't update the device resources >in the enable/disable paths. If we could do the resource adjustments >earlier, somewhere before we give the device to a driver, it seems >like it would avoid these issues. > >We might have talked about these questions in the past, so I apologize >if you've already explained this. If that's the case, maybe we just >need some comments in the code to help the next confused reader. > Bjorn, thanks for review. I agree it's not perfect. We discussed this long time ago as I can remember. Let me try to make it a bit more clear: In our PHB hardware, there are 16 MMIO BARs. Each of them can be shared by 256 PEs (A) and owned exclusively by one PE (B). When VF BAR size is small enough, we take (A). Otherwise, we have to take (B). Only when taking (A), we need expand/move/shrink the IOV BAR. So lets stick to (A) for discussion here. Under (A), PF's IOV BAR size is extended to ((256 * (VF BAR size)) when the PF is probed. Then the @res, which corresponds to the IOV BAR, is assigned and put into the resource tree during resource sizing and assignment stage. The IOV capability is going to be enabled by PF's driver or sysfs entry, it calls into pnv_pci_sriov_enable() where number of contigous PEs (equal to number of VFs to be enabled) are allocated. We shift the IOV BAR base according to the starting PE number of the allocated block. Afterewards, the IOV BAR is restored when the IOV capability is disabled. So it's all about the PE. The IOV BAR's end address isn't touched, we needn't update @res->end when restoring the IOV BAR. In order to avoid moving IOV BAR base address, I need know the the PEs for the VFs before resourcd sizing and assignment stage. It means I need to reserve PEs in advance, which isn't nice because we never enable the VFs. In that case, the PEs are wasted. Yeah, it's nice to have add some comments in pnv_pci_vf_resource_shift() where pci_update_resource() is called. I will post another patch to linux-ppcdev and you'll be copied. If you agree, I think you can merge this patch as none of the concerns are too much related. Thanks, Gavin