On Sat, Mar 18, 2017 at 11:19:07AM +1100, Gavin Shan wrote: > On Thu, Mar 09, 2017 at 11:34:32AM +1100, Gavin Shan wrote: > >On Tue, Mar 07, 2017 at 02:15:29PM -0600, Bjorn Helgaas wrote: > >>On Mon, Feb 27, 2017 at 11:18:32AM +1100, Gavin Shan wrote: > >>> The PowerNV platform is the only user of pcibios_sriov_disable(). > >>> The IOV BAR could be shifted by pci_iov_update_resource(). The > >>> warning message in the function is printed if the IOV capability > >>> is in enabled (PCI_SRIOV_CTRL_VFE && PCI_SRIOV_CTRL_MSE) state. > >>> > >>> pci_disable_sriov > >>> sriov_disable > >>> pnv_pci_sriov_disable > >>> pnv_pci_vf_resource_shift > >>> pci_update_resource > >>> pci_iov_update_resource > >>> > >>> This fixes the issue by disabling IOV capability before calling > >>> pcibios_sriov_disable(). With it, the disabling path matches with > >>> the enabling path: pcibios_sriov_enable() is called before the > >>> IOV capability is enabled. > >> > >>I'm vaguely uncomfortable about this path: > >> > >> pci_disable_sriov > >> sriov_disable > >> pcibios_sriov_disable # powerpc version > >> pnv_pci_sriov_disable > >> pnv_pci_vf_resource_shift > >> res = &dev->resource[i + PCI_IOV_RESOURCES] > >> res->start += size * offset > >> pci_update_resource > >> pci_iov_update_resource > >> pnv_pci_vf_release_m64 > >> > >>1) "res" is already in the resource tree, so we shouldn't be changing > >> its start address, because that may make the tree inconsistent, > >> e.g., the resource may no longer be completely contained in its > >> parent, it may conflict with a sibling, etc. > >> > >>2) If we update "res->start", shouldn't we update "res->end" > >> correspondingly? > >> > >>It seems like it'd be better if we didn't update the device resources > >>in the enable/disable paths. If we could do the resource adjustments > >>earlier, somewhere before we give the device to a driver, it seems > >>like it would avoid these issues. > >> > >>We might have talked about these questions in the past, so I apologize > >>if you've already explained this. If that's the case, maybe we just > >>need some comments in the code to help the next confused reader. > >> > > > >Bjorn, thanks for review. I agree it's not perfect. We discussed this long > >time ago as I can remember. Let me try to make it a bit more clear: In our > >PHB hardware, there are 16 MMIO BARs. Each of them can be shared by 256 PEs > >(A) and owned exclusively by one PE (B). When VF BAR size is small enough, > >we take (A). Otherwise, we have to take (B). Only when taking (A), we need > >expand/move/shrink the IOV BAR. So lets stick to (A) for discussion here. > > > >Under (A), PF's IOV BAR size is extended to ((256 * (VF BAR size)) when the > >PF is probed. Then the @res, which corresponds to the IOV BAR, is assigned > >and put into the resource tree during resource sizing and assignment stage. > >The IOV capability is going to be enabled by PF's driver or sysfs entry, it > >calls into pnv_pci_sriov_enable() where number of contigous PEs (equal to > >number of VFs to be enabled) are allocated. We shift the IOV BAR base according > >to the starting PE number of the allocated block. Afterewards, the IOV BAR > >is restored when the IOV capability is disabled. So it's all about the PE. > >The IOV BAR's end address isn't touched, we needn't update @res->end when > >restoring the IOV BAR. > > > >In order to avoid moving IOV BAR base address, I need know the the PEs > >for the VFs before resourcd sizing and assignment stage. It means I need > >to reserve PEs in advance, which isn't nice because we never enable the > >VFs. In that case, the PEs are wasted. > > > >Yeah, it's nice to have add some comments in pnv_pci_vf_resource_shift() > >where pci_update_resource() is called. I will post another patch to > >linux-ppcdev and you'll be copied. If you agree, I think you can merge > >this patch as none of the concerns are too much related. > > > > Sorry, Bjorn, ping! Please let me know if there are more concerns you have. I think we had a misunderstanding -- you mentioned adding some comments and wrote "I will post another patch", and I *thought* you meant you were going to post another version of *this* patch with some updated comments. So I've been waiting for that updated patch. But I think you've been waiting for me to merge *this* patch as-is. To avoid having this discussion a third time in the future, I think you should add some comments at the point where you update the resource. Updating a resource after it's in the resource tree is clearly dangerous, so we need some explanation of why it's sort of OK in this particular case. If you can write a comment and dig up a URL to our previous discussion, I'd like to incorporate that into *this* patch before I merge it. The sooner we can document this, the less work it will be in the future. Bjorn