Re: [PATCH] PCI: Batch BAR sizing operations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 20 Jan 2025 18:10:14 +0200 (EET)
Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> wrote:

> On Sat, 11 Jan 2025, Alex Williamson wrote:
> 
> > Toggling memory enable is free on bare metal, but potentially expensive
> > in virtualized environments as the device MMIO spaces are added and
> > removed from the VM address space, including DMA mapping of those spaces
> > through the IOMMU where peer-to-peer is supported.  Currently memory
> > decode is disabled around sizing each individual BAR, even for SR-IOV
> > BARs while VF Enable is cleared.
> > 
> > This can be better optimized for virtual environments by sizing a set
> > of BARs at once, stashing the resulting mask into an array, while only
> > toggling memory enable once.  This also naturally improves the SR-IOV
> > path as the caller becomes responsible for any necessary decode disables
> > while sizing BARs, therefore SR-IOV BARs are sized relying only on the
> > VF Enable rather than toggling the PF memory enable in the command
> > register.
> > 
> > Reported-by: Mitchell Augustin <mitchell.augustin@xxxxxxxxxxxxx>
> > Link: https://lore.kernel.org/all/CAHTA-uYp07FgM6T1OZQKqAdSA5JrZo0ReNEyZgQZub4mDRrV5w@xxxxxxxxxxxxxx
> > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > ---
> > 
> > This is an alternative to the patch proposed by Mitchell[1] and more
> > in line with my suggestion in the original report thread[2].  It makes
> > more sense to me to stash all the BAR sizing values into an array on
> > the front end of parsing them into resources than it does to pass
> > multiple arrays on the backend for status printing purposes.  We can
> > discuss in either of these patches which is the better approach or
> > if someone has a better yet alternative.
> > 
> > I don't have quite the config Mitchell has for testing, but this
> > should make effectively the same improvement and does show a
> > significant improvement in guest boot time even with a single 24GB
> > GPU attached.  There are of course further improvements to investigate
> > in the VMM, but disabling memory decode per BAR is a good start to
> > making Linux be a friendlier guest.  Further testing appreciate.
> > Thanks,
> > 
> > Alex
> > 
> > [1]https://lore.kernel.org/all/20241218224258.2225210-1-mitchell.augustin@xxxxxxxxxxxxx/
> > [2]https://lore.kernel.org/all/20241203150620.15431c5c.alex.williamson@xxxxxxxxxx/
> > 
> >  drivers/pci/iov.c   |   8 ++-
> >  drivers/pci/pci.h   |   4 +-
> >  drivers/pci/probe.c | 132 +++++++++++++++++++++++++++++---------------
> >  3 files changed, 97 insertions(+), 47 deletions(-)
> > 
> > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> > index 4be402fe9ab9..9e4770cdd4d5 100644
> > --- a/drivers/pci/iov.c
> > +++ b/drivers/pci/iov.c
> > @@ -747,6 +747,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
> >  	struct resource *res;
> >  	const char *res_name;
> >  	struct pci_dev *pdev;
> > +	u32 sriovbars[PCI_SRIOV_NUM_BARS];
> >  
> >  	pci_read_config_word(dev, pos + PCI_SRIOV_CTRL, &ctrl);
> >  	if (ctrl & PCI_SRIOV_CTRL_VFE) {
> > @@ -783,6 +784,10 @@ static int sriov_init(struct pci_dev *dev, int pos)
> >  	if (!iov)
> >  		return -ENOMEM;
> >  
> > +	/* Sizing SR-IOV BARs with VF Enable cleared - no decode */
> > +	__pci_size_stdbars(dev, PCI_SRIOV_NUM_BARS,
> > +			   pos + PCI_SRIOV_BAR, sriovbars);
> > +
> >  	nres = 0;
> >  	for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
> >  		res = &dev->resource[i + PCI_IOV_RESOURCES];
> > @@ -796,7 +801,8 @@ static int sriov_init(struct pci_dev *dev, int pos)
> >  			bar64 = (res->flags & IORESOURCE_MEM_64) ? 1 : 0;
> >  		else
> >  			bar64 = __pci_read_base(dev, pci_bar_unknown, res,
> > -						pos + PCI_SRIOV_BAR + i * 4);
> > +						pos + PCI_SRIOV_BAR + i * 4,
> > +						&sriovbars[i]);
> >  		if (!res->flags)
> >  			continue;
> >  		if (resource_size(res) & (PAGE_SIZE - 1)) {
> > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > index 2e40fc63ba31..6f27651c2a70 100644
> > --- a/drivers/pci/pci.h
> > +++ b/drivers/pci/pci.h
> > @@ -315,8 +315,10 @@ bool pci_bus_generic_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *pl,
> >  int pci_idt_bus_quirk(struct pci_bus *bus, int devfn, u32 *pl, int rrs_timeout);
> >  
> >  int pci_setup_device(struct pci_dev *dev);
> > +void __pci_size_stdbars(struct pci_dev *dev, int count,
> > +			unsigned int pos, u32 *sizes);
> >  int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
> > -		    struct resource *res, unsigned int reg);
> > +		    struct resource *res, unsigned int reg, u32 *sizes);
> >  void pci_configure_ari(struct pci_dev *dev);
> >  void __pci_bus_size_bridges(struct pci_bus *bus,
> >  			struct list_head *realloc_head);
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > index 2e81ab0f5a25..5ca96280d698 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -164,50 +164,75 @@ static inline unsigned long decode_bar(struct pci_dev *dev, u32 bar)
> >  
> >  #define PCI_COMMAND_DECODE_ENABLE	(PCI_COMMAND_MEMORY | PCI_COMMAND_IO)
> >  
> > +/**
> > + * __pci_size_bars - Read the raw BAR mask for a range of PCI BARs
> > + * @dev: the PCI device
> > + * @count: number of BARs to size
> > + * @pos: starting config space position
> > + * @sizes: array to store mask values
> > + * @rom: indicate whether to use ROM mask, which avoids enabling ROM BARs
> > + *
> > + * Provided @sizes array must be sufficiently sized to store results for
> > + * @count u32 BARs.  Caller is responsible for disabling decode to specified
> > + * BAR range around calling this function.  This function is intended to avoid
> > + * disabling decode around sizing each BAR individually, which can result in
> > + * non-trivial overhead in virtualized environments with very large PCI BARs.
> > + */
> > +static void __pci_size_bars(struct pci_dev *dev, int count,
> > +			    unsigned int pos, u32 *sizes, bool rom)
> > +{
> > +	u32 orig, mask = rom ? PCI_ROM_ADDRESS_MASK : ~0;
> > +	int i;
> > +
> > +	for (i = 0; i < count; i++, pos += 4, sizes++) {
> > +		pci_read_config_dword(dev, pos, &orig);
> > +		pci_write_config_dword(dev, pos, mask);
> > +		pci_read_config_dword(dev, pos, sizes);
> > +		pci_write_config_dword(dev, pos, orig);
> > +
> > +		/*
> > +		 * All bits set in size means the device isn't working properly.
> > +		 * If the BAR isn't implemented, all bits must be 0.  If it's a
> > +		 * memory BAR or a ROM, bit 0 must be clear; if it's an io BAR,
> > +		 * bit 1 must be clear.
> > +		 */
> > +		if (PCI_POSSIBLE_ERROR(*sizes))
> > +			*sizes = 0;
> > +	}
> > +}  
> 
> I'm trying to understand how 64-bit BARs are supposed to work here. The 
> *sizes is being filled inside this function for both lower and upper 
> u32, right? So why can PCI_POSSIBLE_ERROR() be used for the upper part? 
> Can't the upper u32 of a 64-bit BAR have all bits as 1?

Thanks for catching this, I think this check just needs to be moved
back to its original location below so that we only do this on the
lower u32.  I'll send a v2 with that change.  Thanks!

Alex

> > +void __pci_size_stdbars(struct pci_dev *dev, int count,
> > +			unsigned int pos, u32 *sizes)
> > +{
> > +	__pci_size_bars(dev, count, pos, sizes, false);
> > +}
> > +
> > +static void __pci_size_rom(struct pci_dev *dev, unsigned int pos, u32 *sizes)
> > +{
> > +	__pci_size_bars(dev, 1, pos, sizes, true);
> > +}
> > +
> >  /**
> >   * __pci_read_base - Read a PCI BAR
> >   * @dev: the PCI device
> >   * @type: type of the BAR
> >   * @res: resource buffer to be filled in
> >   * @pos: BAR position in the config space
> > + * @sizes: array of one or more pre-read BAR masks
> >   *
> >   * Returns 1 if the BAR is 64-bit, or 0 if 32-bit.
> >   */
> >  int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
> > -		    struct resource *res, unsigned int pos)
> > +		    struct resource *res, unsigned int pos, u32 *sizes)
> >  {
> > -	u32 l = 0, sz = 0, mask;
> > +	u32 l = 0;
> >  	u64 l64, sz64, mask64;
> > -	u16 orig_cmd;
> >  	struct pci_bus_region region, inverted_region;
> >  	const char *res_name = pci_resource_name(dev, res - dev->resource);
> >  
> > -	mask = type ? PCI_ROM_ADDRESS_MASK : ~0;
> > -
> > -	/* No printks while decoding is disabled! */
> > -	if (!dev->mmio_always_on) {
> > -		pci_read_config_word(dev, PCI_COMMAND, &orig_cmd);
> > -		if (orig_cmd & PCI_COMMAND_DECODE_ENABLE) {
> > -			pci_write_config_word(dev, PCI_COMMAND,
> > -				orig_cmd & ~PCI_COMMAND_DECODE_ENABLE);
> > -		}
> > -	}
> > -
> >  	res->name = pci_name(dev);
> >  
> >  	pci_read_config_dword(dev, pos, &l);
> > -	pci_write_config_dword(dev, pos, l | mask);
> > -	pci_read_config_dword(dev, pos, &sz);
> > -	pci_write_config_dword(dev, pos, l);
> > -
> > -	/*
> > -	 * All bits set in sz means the device isn't working properly.
> > -	 * If the BAR isn't implemented, all bits must be 0.  If it's a
> > -	 * memory BAR or a ROM, bit 0 must be clear; if it's an io BAR, bit
> > -	 * 1 must be clear.
> > -	 */
> > -	if (PCI_POSSIBLE_ERROR(sz))
> > -		sz = 0;
> >  
> >  	/*
> >  	 * I don't know how l can have all bits set.  Copied from old code.
> > @@ -221,35 +246,30 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
> >  		res->flags |= IORESOURCE_SIZEALIGN;
> >  		if (res->flags & IORESOURCE_IO) {
> >  			l64 = l & PCI_BASE_ADDRESS_IO_MASK;
> > -			sz64 = sz & PCI_BASE_ADDRESS_IO_MASK;
> > +			sz64 = *sizes & PCI_BASE_ADDRESS_IO_MASK;
> >  			mask64 = PCI_BASE_ADDRESS_IO_MASK & (u32)IO_SPACE_LIMIT;
> >  		} else {
> >  			l64 = l & PCI_BASE_ADDRESS_MEM_MASK;
> > -			sz64 = sz & PCI_BASE_ADDRESS_MEM_MASK;
> > +			sz64 = *sizes & PCI_BASE_ADDRESS_MEM_MASK;
> >  			mask64 = (u32)PCI_BASE_ADDRESS_MEM_MASK;
> > +
> > +			if (res->flags & IORESOURCE_MEM_64) {
> > +				pci_read_config_dword(dev, pos + 4, &l);
> > +				sizes++;
> > +
> > +				l64 |= ((u64)l << 32);
> > +				sz64 |= ((u64)*sizes << 32);
> > +				mask64 |= ((u64)~0 << 32);
> > +			}
> >  		}
> >  	} else {
> >  		if (l & PCI_ROM_ADDRESS_ENABLE)
> >  			res->flags |= IORESOURCE_ROM_ENABLE;
> >  		l64 = l & PCI_ROM_ADDRESS_MASK;
> > -		sz64 = sz & PCI_ROM_ADDRESS_MASK;
> > +		sz64 = *sizes & PCI_ROM_ADDRESS_MASK;
> >  		mask64 = PCI_ROM_ADDRESS_MASK;
> >  	}
> >  
> > -	if (res->flags & IORESOURCE_MEM_64) {
> > -		pci_read_config_dword(dev, pos + 4, &l);
> > -		pci_write_config_dword(dev, pos + 4, ~0);
> > -		pci_read_config_dword(dev, pos + 4, &sz);
> > -		pci_write_config_dword(dev, pos + 4, l);
> > -
> > -		l64 |= ((u64)l << 32);
> > -		sz64 |= ((u64)sz << 32);
> > -		mask64 |= ((u64)~0 << 32);
> > -	}  
> 
> No PCI_POSSIBLE_ERROR() check here for the upper u32 in the previous 
> code.
> 






[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux