On Tue, Oct 1, 2019 at 3:09 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > On Mon, Sep 30, 2019 at 12:08:46PM +1000, Oliver O'Halloran wrote: > > This is all powerpc, so I assume Michael will handle this. Just > random things I noticed; ignore if they don't make sense: > > > On PowerNV we use the pcibios_sriov_enable() hook to do two things: > > > > 1. Create a pci_dn structure for each of the VFs, and > > 2. Configure the PHB's internal BARs that map MMIO ranges to PEs > > so that each VF has it's own PE. Note that the PE also determines > > s/it's/its/ > > > the IOMMU table the HW uses for the device. > > > > Currently we do not set the pe_number field of the pci_dn immediately after > > assigning the PE number for the VF that it represents. Instead, we do that > > in a fixup (see pnv_pci_dma_dev_setup) which is run inside the > > pcibios_add_device() hook which is run prior to adding the device to the > > bus. > > > > On PowerNV we add the device to it's IOMMU group using a bus notifier and > > s/it's/its/ > > > in order for this to work the PE number needs to be known when the bus > > notifier is run. This works today since the PE number is set in the fixup > > which runs before adding the device to the bus. However, if we want to move > > the fixup to a later stage this will break. > > > > We can fix this by setting the pdn->pe_number inside of > > pcibios_sriov_enable(). There's no good to avoid this since we already have > > s/no good/no good reason/ ? > > Not quite sure what "this" refers to ... "no good reason to avoid > setting pdn->pe_number in pcibios_sriov_enable()"? The double > negative makes it a little hard to parse. I agree it's a bit vague, I'll re-word it. > > all the required information at that point, so... do that. Moving this > > earlier does cause two problems: > > > > 1. We trip the WARN_ON() in the fixup code, and > > 2. The EEH core will clear pdn->pe_number while recovering VFs. > > > > The only justification for either of these is a comment in eeh_rmv_device() > > suggesting that pdn->pe_number *must* be set to IODA_INVALID_PE in order > > for the VF to be scanned. However, this comment appears to have no basis in > > reality so just delete it. > > > > Signed-off-by: Oliver O'Halloran <oohall@xxxxxxxxx> > > --- > > Can't get rid of the fixup entirely since we need it to set the > > ioda_pe->pdev back-pointer. I'll look at killing that another time. > > --- > > arch/powerpc/kernel/eeh_driver.c | 6 ------ > > arch/powerpc/platforms/powernv/pci-ioda.c | 19 +++++++++++++++---- > > arch/powerpc/platforms/powernv/pci.c | 4 ---- > > 3 files changed, 15 insertions(+), 14 deletions(-) > > > > diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c > > index d9279d0..7955fba 100644 > > --- a/arch/powerpc/kernel/eeh_driver.c > > +++ b/arch/powerpc/kernel/eeh_driver.c > > @@ -541,12 +541,6 @@ static void eeh_rmv_device(struct eeh_dev *edev, void *userdata) > > > > pci_iov_remove_virtfn(edev->physfn, pdn->vf_index); > > edev->pdev = NULL; > > - > > - /* > > - * We have to set the VF PE number to invalid one, which is > > - * required to plug the VF successfully. > > - */ > > - pdn->pe_number = IODA_INVALID_PE; > > #endif > > if (rmv_data) > > list_add(&edev->rmv_entry, &rmv_data->removed_vf_list); > > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c > > index 5e3172d..70508b3 100644 > > --- a/arch/powerpc/platforms/powernv/pci-ioda.c > > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > > @@ -1558,6 +1558,10 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs) > > > > /* Reserve PE for each VF */ > > for (vf_index = 0; vf_index < num_vfs; vf_index++) { > > + int vf_devfn = pci_iov_virtfn_devfn(pdev, vf_index); > > + int vf_bus = pci_iov_virtfn_bus(pdev, vf_index); > > + struct pci_dn *vf_pdn; > > + > > if (pdn->m64_single_mode) > > pe_num = pdn->pe_num_map[vf_index]; > > else > > @@ -1570,13 +1574,11 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs) > > pe->pbus = NULL; > > pe->parent_dev = pdev; > > pe->mve_number = -1; > > - pe->rid = (pci_iov_virtfn_bus(pdev, vf_index) << 8) | > > - pci_iov_virtfn_devfn(pdev, vf_index); > > + pe->rid = (vf_bus << 8) | vf_devfn; > > > > pe_info(pe, "VF %04d:%02d:%02d.%d associated with PE#%x\n", > > Not related to *this* patch, but this looks like maybe it's supposed > to match the pci_name(), e.g., "%04x:%02x:%02x.%d" from > pci_setup_device()? If so, the "%04d:%02d:%02d" here will be > confusing since the decimal & hex won't always match. That looks plain wrong. I'll send a separate patch to fix it. > > hose->global_number, pdev->bus->number, > > Consider pci_domain_nr(bus) instead of hose->global_number? It would > be nice if you had the pci_dev * for each VF so you could just use > pci_name(vf) instead of all this domain/bus/PCI_SLOT/FUNC. Unfortunately, we don't have pci_devs for the VFs when pcibios_sriov_enable() is called. On powernv (and pseries) we only permit config accesses to a BDF when a pci_dn exists for that BDF because the platform code assumes that one will exist. As a result we can't scan the VFs until after pcibios_sriov_enable() is called since that's where pci_dn's are created for the VFs. I'm working on removing the use of pci_dn from powernv entirely though. Once that's done we should revisit whether any of this infrastructure is necessary... Oliver