[AMD Official Use Only - General] > -----Original Message----- > From: Ma, Rui <Rui.Ma@xxxxxxx> > Sent: Tuesday, October 11, 2022 7:19 AM > To: Bjorn Helgaas <helgaas@xxxxxxxxxx> > Cc: bhelgaas@xxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; linux- > kernel@xxxxxxxxxxxxxxx; Deucher, Alexander > <Alexander.Deucher@xxxxxxx> > Subject: RE: [PATCH] PCI/IOV: Decrease VF memory BAR size to save host > memory occupied by PTEs > > [AMD Official Use Only - General] > > Hi Helgass: > Thank you very much for your suggestions on my patch! > > The patch is a device-specific behavior. In our AMD device SR-IOV, > the actual VF BAR size is dependent on NumVFs too. If only one VF is > created, the VF BAR size will depend on BAR probing algorithm as described > in Section 9.3.3.14, but when several VFs created our own driver will > decrease BAR0 memory size according to NumVFs. So I want to add this > quirk to keep Linux code and certain driver code consistent. > > > Except that this doesn't affect the *starting* address of each VF BAR, > which I guess is what you mean by "BAR memory mapping is always based on > the initial device physical device." > Yes we should not change the starting address or the device cannot load > well. > > > Well, I guess the device still describes its worst-case BAR size; the quirk > basically just optimizes space usage. Right? > Yes. > > > Aren't both virt and phys contiguous and nicely aligned for this case? It > seems like the perfect application for huge pages. > It cannot use huge page though address aligned. + KVM, Christian I think this is just working around a bug in KVM. This is to avoid wasting huge amounts of memory for page tables due to KVM using 4K pages for some reason. I think we should figure out why KVM is not using huge pages for this rather than working around this in the PCI layer. Alex > > >> + shift = 1; > >> + shift = virtfn_get_shift(dev, iov->num_VFs, i); > >Maybe more of the fiddling could be hidden in the quirk, e.g., > > size = quirk_vf_bar_size(dev, iov->num_VFs, i); > > if (!size) > > size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); > If I hide get-shift func in the quirk is it concise to call pci_iov_resource_size() > in quirk? > > And I solved other issues in the patch sent later. Thank you for your > patience! > > > Regards, > Rui > > -----Original Message----- > From: Bjorn Helgaas <helgaas@xxxxxxxxxx> > Sent: Wednesday, September 28, 2022 6:07 AM > To: Ma, Rui <Rui.Ma@xxxxxxx> > Cc: bhelgaas@xxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; linux- > kernel@xxxxxxxxxxxxxxx > Subject: Re: [PATCH] PCI/IOV: Decrease VF memory BAR size to save host > memory occupied by PTEs > > On Mon, Sep 26, 2022 at 04:05:42PM +0800, Rui Ma wrote: > > In SR_IOV scene, when the device physical space(such as Video RAM)is > > fixed, as the number of VFs increases, the actual BAR memory space > > used by each VF decreases. > > s/SR_IOV/SR-IOV/ to match spec usage. > > I think this is device-specific behavior, right? I don't see anything in the PCIe > spec about the BAR size being dependent on NumVFs. If it's device-specific, > it shouldn't be presented as "for all SR-IOV devices, the actual BAR memory > space decreases as number of VFs increases." > > > However, the BAR memory mapping is always based on the initial device > > physical device. So do not map this unneeded memory can save host > > memory occupied by PTEs. Although each PTE only occupies a few bytes > > of space on its own, a large number of PTEs can still take up a lot of space. > > So IIUC this is basically a quirk to override the "VF BAR aperture" > size, which PCIe r6.0, sec 9.2.1.1.1 says is "determined by the usual BAR > probing algorithm as described in Section 9.3.3.14." > > Except that this doesn't affect the *starting* address of each VF BAR, which I > guess is what you mean by "BAR memory mapping is always based on the > initial device physical device." > > Hmm. This kind of breaks the "plug and play" model of PCI because the > device is no longer self-describing. Well, I guess the device still describes its > worst-case BAR size; the quirk basically just optimizes space usage. Right? > > It's a shame if we can't reduce PTE usage by using hugeTLB pages for this. > Aren't both virt and phys contiguous and nicely aligned for this case? It > seems like the perfect application for huge pages. > > > Signed-off-by: Rui Ma <Rui.Ma@xxxxxxx> > > --- > > drivers/pci/iov.c | 14 ++++++++++++-- > > drivers/pci/pci.h | 15 +++++++++++++++ > > drivers/pci/quirks.c | 37 +++++++++++++++++++++++++++++++++++++ > > 3 files changed, 64 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index > > 952217572113..6b9f9b6b9be1 100644 > > --- a/drivers/pci/iov.c > > +++ b/drivers/pci/iov.c > > @@ -296,6 +296,14 @@ int pci_iov_add_virtfn(struct pci_dev *dev, int id) > > struct pci_sriov *iov = dev->sriov; > > struct pci_bus *bus; > > > > + /* > > + * Some SR-IOV device's BAR map range is larger than they can actually > use. > > + * This extra BAR space occupy too much reverse mapping size(physical > page > > + * back to the PTEs). So add a divisor shift parameter to resize the > > + * request resource of VF. > > + */ > > + u16 shift = 1; > > + > > bus = virtfn_add_bus(dev->bus, pci_iov_virtfn_bus(dev, id)); > > if (!bus) > > goto failed; > > @@ -328,8 +336,10 @@ int pci_iov_add_virtfn(struct pci_dev *dev, int id) > > virtfn->resource[i].name = pci_name(virtfn); > > virtfn->resource[i].flags = res->flags; > > size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); > > + shift = 1; > > + shift = virtfn_get_shift(dev, iov->num_VFs, i); > > Maybe more of the fiddling could be hidden in the quirk, e.g., > > size = quirk_vf_bar_size(dev, iov->num_VFs, i); > if (!size) > size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); > > > virtfn->resource[i].start = res->start + size * id; > > - virtfn->resource[i].end = virtfn->resource[i].start + size - 1; > > + virtfn->resource[i].end = virtfn->resource[i].start + (size >> > > +(shift - 1)) - 1; > > rc = request_resource(res, &virtfn->resource[i]); > > BUG_ON(rc); > > } > > @@ -680,12 +690,12 @@ static int sriov_enable(struct pci_dev *dev, int > nr_virtfn) > > msleep(100); > > pci_cfg_access_unlock(dev); > > > > + iov->num_VFs = nr_virtfn; > > rc = sriov_add_vfs(dev, initial); > > if (rc) > > goto err_pcibios; > > > > kobject_uevent(&dev->dev.kobj, KOBJ_CHANGE); > > - iov->num_VFs = nr_virtfn; > > > > return 0; > > > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index > > 3d60cabde1a1..befc67a280eb 100644 > > --- a/drivers/pci/pci.h > > +++ b/drivers/pci/pci.h > > @@ -603,6 +603,21 @@ static inline int pci_dev_specific_reset(struct > > pci_dev *dev, bool probe) } #endif > > > > +struct virtfn_get_shift_methods { > > + u16 vendor; > > + u16 device; > > + u16 (*get_shift)(struct pci_dev *dev, u16 arg, int arg2); }; > > + > > +#ifdef CONFIG_PCI_QUIRKS > > +u16 virtfn_get_shift(struct pci_dev *dev, u16 arg1, int arg2); #else > > +static inline u16 virtfn_get_shift(struct pci_dev *dev, u16 arg1, int > > +arg2) { > > + return (u16)1; > > +} > > +#endif > > + > > #if defined(CONFIG_PCI_QUIRKS) && defined(CONFIG_ARM64) int > > acpi_get_rc_resources(struct device *dev, const char *hid, u16 segment, > > struct resource *res); > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index > > da829274fc66..add587919705 100644 > > --- a/drivers/pci/quirks.c > > +++ b/drivers/pci/quirks.c > > @@ -4085,6 +4085,43 @@ int pci_dev_specific_reset(struct pci_dev *dev, > bool probe) > > return -ENOTTY; > > } > > > > +static u16 divided_by_VF(struct pci_dev *dev, u16 num_VFs, int > > +bar_num) > > This is clearly ATI specific or at the very least specific to devices that divvy up > BAR0 in special ways, so the name is a bit too generic. > > > +{ > > + u16 shift = 1; > > + > > + if (bar_num == 0) { > > + while ((1 << shift) <= num_VFs) > > + shift += 1; > > + } > > + pci_info(dev, "BAR %d get shift: %d.\n", bar_num, shift); > > Drop the period at end. If we're changing the size, I think it would be useful > to know num_VFs, BAR #, and the new size. IIUC, "dev" here is the PF, so > this is the "VF BAR", not the BAR 0 of the PF. > > > + return shift; > > +} > > + > > +static const struct virtfn_get_shift_methods virtfn_get_shift_methods[] = > { > > + { PCI_VENDOR_ID_ATI, 0x73a1, divided_by_VF}, > > + { 0 } > > +}; > > + > > +/* > > + * Get shift num to calculate SR-IOV device BAR. Sometimes the BAR > > +size for > > + * SR-IOV device is too large and we want to calculate the size to > > +define > > + * the end of virtfn. > > + */ > > +u16 virtfn_get_shift(struct pci_dev *dev, u16 arg1, int arg2) { > > + const struct virtfn_get_shift_methods *i; > > + > > + for (i = virtfn_get_shift_methods; i->get_shift; i++) { > > + if ((i->vendor == dev->vendor || > > + i->vendor == (u16)PCI_ANY_ID) && > > + (i->device == dev->device || > > + i->device == (u16)PCI_ANY_ID)) > > + return i->get_shift(dev, arg1, arg2); > > + } > > + > > + return (u16)1; > > +} > > + > > static void quirk_dma_func0_alias(struct pci_dev *dev) { > > if (PCI_FUNC(dev->devfn) != 0) > > -- > > 2.25.1 > >