Re: [PATCH v4 1/2] PCI: Take other bus devices into account when distributing resources

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Wed, Jan 04, 2023 at 04:48:09PM -0600, Bjorn Helgaas wrote:
> On Wed, Jan 04, 2023 at 11:16:34AM +0200, Mika Westerberg wrote:
> > A PCI bridge may reside on a bus with other devices as well. The
> > resource distribution code does not take this into account properly and
> > therefore it expands the bridge resource windows too much, not leaving
> > space for the other devices (or functions of a multifunction device) and
> > this leads to an issue that Jonathan reported. He runs QEMU with the
> > following topology (QEMU parameters):
> > 
> >  -device pcie-root-port,port=0,id=root_port13,chassis=0,slot=2  \
> >  -device x3130-upstream,id=sw1,bus=root_port13,multifunction=on \
> >  -device e1000,bus=root_port13,addr=0.1                         \
> >  -device xio3130-downstream,id=fun1,bus=sw1,chassis=0,slot=3    \
> >  -device e1000,bus=fun1
> > 
> > The first e1000 NIC here is another function in the switch upstream
> > port. This leads to following errors:
> > 
> >   pci 0000:00:04.0: bridge window [mem 0x10200000-0x103fffff] to [bus 02-04]
> >   pci 0000:02:00.0: bridge window [mem 0x10200000-0x103fffff] to [bus 03-04]
> >   pci 0000:02:00.1: BAR 0: failed to assign [mem size 0x00020000]
> >   e1000 0000:02:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
> > 
> > Fix this by taking into account the device BARs on the bus, including
> > the ones belonging to bridges themselves.
> 
> I think it accounts for bridge windows (and VF BARs from SR-IOV
> Capabilities) as well as regular BARs, doesn't it?

Yes. Do you want me to mention them all here as well?

> > Link: https://lore.kernel.org/linux-pci/20221014124553.0000696f@xxxxxxxxxx/
> > Link: https://lore.kernel.org/linux-pci/6053736d-1923-41e7-def9-7585ce1772d9@xxxxxxxxxxxxx/
> > Reported-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> > Reported-by: Alexander Motin <mav@xxxxxxxxxxxxx>
> > Signed-off-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx>
> > ---
> >  drivers/pci/setup-bus.c | 205 +++++++++++++++++++++++++---------------
> >  1 file changed, 129 insertions(+), 76 deletions(-)
> > 
> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > index b4096598dbcb..cf6a7bdf2427 100644
> > --- a/drivers/pci/setup-bus.c
> > +++ b/drivers/pci/setup-bus.c
> > @@ -1750,6 +1750,7 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
> >  				 resource_size_t new_size)
> >  {
> >  	resource_size_t add_size, size = resource_size(res);
> > +	resource_size_t min_size;
> >  
> >  	if (res->parent)
> >  		return;
> > @@ -1757,30 +1758,87 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
> >  	if (!new_size)
> >  		return;
> >  
> > +	/* Minimum granularity of a bridge bridge window */
> 
> bridge bridge (fixed in my copy)

Right, thanks.

> > +	min_size = window_alignment(bridge->bus, res->flags);
> > +
> >  	if (new_size > size) {
> >  		add_size = new_size - size;
> > +		if (add_size < min_size)
> > +			return;
> >  		pci_dbg(bridge, "bridge window %pR extended by %pa\n", res,
> >  			&add_size);
> >  	} else if (new_size < size) {
> >  		add_size = size - new_size;
> > +		if (add_size < min_size)
> > +			return;
> >  		pci_dbg(bridge, "bridge window %pR shrunken by %pa\n", res,
> >  			&add_size);
> > +	} else {
> > +		return;
> >  	}
> >  
> >  	res->end = res->start + new_size - 1;
> >  	remove_from_list(add_list, res);
> >  }
> >  
> > +static void reduce_dev_resources(struct pci_dev *dev, struct resource *io,
> > +				 struct resource *mmio,
> > +				 struct resource *mmio_pref)
> 
> IIUC, this is removing the space consumed by "dev" from io, mmio,
> mmio_pref.

Yes.

> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> > +		struct resource *res = &dev->resource[i];
> > +		resource_size_t align, tmp, size;
> > +
> > +		size = resource_size(res);
> > +		if (!size)
> > +			continue;
> > +
> > +		align = pci_resource_alignment(dev, res);
> > +
> > +		if (resource_type(res) == IORESOURCE_IO) {
> > +			align = align ? ALIGN(io->start, align) - io->start : 0;
> > +			tmp = align + size;
> > +			io->start = min(io->start + tmp, io->end + 1);
> 
> Do you think it would be worth a helper to do this repeated
> calculation?  AFAICT they're all the same.
> 
>   static void remove_dev_res(struct resource *avail, struct pci_dev *dev, struct resource *res)
>   {
>     size = resource_size(res);
>     if (!size)
>       return;
> 
>     align = pci_resource_alignment(dev, res);
>     align = align ? ALIGN(avail->start, align) - avail->start : 0;
>     tmp = align + size;
>     avail->start = min(avail->start + tmp, avail->end + 1);
>   }
> 
>   if (resource_type(res) == IORESOURCE_IO)
>     remove_dev_res(io, dev, res);
>   else if (resource_type(res) == IORESOURCE_MEM) {
>     if (res->flags & IORESOURCE_PREFETCH)
>       remove_dev_res(mmio_pref, dev, res);
>     else
>       remove_dev_res(mmio, dev, res);
>   }

Yes, makes sense.

> I don't quite understand how it works.  What's the case where "align"
> can be zero?  Will we remove VF BARs twice, once while processing the
> PF (where pf->resource[PCI_IOV_RESOURCES] contains space for BAR 0 of
> all the VFs), and again while processing the VFs?

pci_resource_alignment() returns 0 if the resource does not have align
set (start/size align). At least this may happen for bridge windows if
they are not yet "sized" by pbus_size_mem()/io(). This is the original
code that I moved here.

Regarding VFs, that's a good question. Should we skip the VFs here and
only account for the PF?

> > +		} else if (resource_type(res) == IORESOURCE_MEM) {
> > +			if (res->flags & IORESOURCE_PREFETCH) {
> > +				align = align ? ALIGN(mmio_pref->start, align) -
> > +						mmio_pref->start : 0;
> > +				tmp = align + size;
> > +				mmio_pref->start = min(mmio_pref->start + tmp,
> > +						       mmio_pref->end + 1);
> > +			} else {
> > +				align = align ? ALIGN(mmio->start, align) -
> > +						mmio->start : 0;
> > +				tmp = align + size;
> > +				mmio->start = min(mmio->start + tmp,
> > +						  mmio->end + 1);
> > +			}
> > +		}
> > +	}
> > +}
> > +
> > +/*
> > + * io, mmio and mmio_pref contain the total amount of bridge window
> > + * space available. This includes the minimal space needed to cover all
> > + * the existing devices on the bus and the possible extra space that can
> > + * be shared with the bridges.
> 
> So "mmio" is space available on "bus", i.e., this is a window that
> bus->self forwards to "bus" and is available for devices on "bus"?

Yes.

> This space would be available for devices on "bus", i.e., for BARs of
> those devices and windows of any bridges to devices below "bus"?

Yes.

> > + * The resource space consumed by bus->self (the bridge) is already
> > + * reduced.
> 
> I don't quite get what this is telling me.  If "mmio" is space
> forwarded to "bus" by bus->self, what does it mean for it to be
> reduced?  If bus->self forwards it, the entire space should be
> available on "bus".

I mean that if the bus->self has BARs itself they are not included here
(as they belong to the bus above this on). I can drop this comment
altogether if it confuses.

> > + */
> >  static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> >  					    struct list_head *add_list,
> >  					    struct resource io,
> >  					    struct resource mmio,
> >  					    struct resource mmio_pref)
> >  {
> > +	resource_size_t io_align, mmio_align, mmio_pref_align;
> > +	resource_size_t io_per_b, mmio_per_b, mmio_pref_per_b;
> >  	unsigned int normal_bridges = 0, hotplug_bridges = 0;
> >  	struct resource *io_res, *mmio_res, *mmio_pref_res;
> >  	struct pci_dev *dev, *bridge = bus->self;
> > -	resource_size_t io_per_hp, mmio_per_hp, mmio_pref_per_hp, align;
> >  
> >  	io_res = &bridge->resource[PCI_BRIDGE_IO_WINDOW];
> >  	mmio_res = &bridge->resource[PCI_BRIDGE_MEM_WINDOW];
> > @@ -1790,17 +1848,17 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> >  	 * The alignment of this bridge is yet to be considered, hence it must
> >  	 * be done now before extending its bridge window.
> >  	 */
> > -	align = pci_resource_alignment(bridge, io_res);
> > -	if (!io_res->parent && align)
> > -		io.start = min(ALIGN(io.start, align), io.end + 1);
> > +	io_align = pci_resource_alignment(bridge, io_res);
> > +	if (!io_res->parent && io_align)
> > +		io.start = min(ALIGN(io.start, io_align), io.end + 1);
> >  
> > -	align = pci_resource_alignment(bridge, mmio_res);
> > -	if (!mmio_res->parent && align)
> > -		mmio.start = min(ALIGN(mmio.start, align), mmio.end + 1);
> > +	mmio_align = pci_resource_alignment(bridge, mmio_res);
> > +	if (!mmio_res->parent && mmio_align)
> > +		mmio.start = min(ALIGN(mmio.start, mmio_align), mmio.end + 1);
> >  
> > -	align = pci_resource_alignment(bridge, mmio_pref_res);
> > -	if (!mmio_pref_res->parent && align)
> > -		mmio_pref.start = min(ALIGN(mmio_pref.start, align),
> > +	mmio_pref_align = pci_resource_alignment(bridge, mmio_pref_res);
> > +	if (!mmio_pref_res->parent && mmio_pref_align)
> > +		mmio_pref.start = min(ALIGN(mmio_pref.start, mmio_pref_align),
> >  			mmio_pref.end + 1);
> 
> It looks like this hunk and the corresponding "Make sure the split
> resource space is properly aligned" one below could conceivably fix a
> separate problem all by themselves?  If so, could be split to a
> separate patch?

Sure.

> >  	/*
> > @@ -1824,94 +1882,89 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> >  			normal_bridges++;
> >  	}
> >  
> > -	/*
> > -	 * There is only one bridge on the bus so it gets all available
> > -	 * resources which it can then distribute to the possible hotplug
> > -	 * bridges below.
> > -	 */
> > -	if (hotplug_bridges + normal_bridges == 1) {
> > -		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> > -		if (dev->subordinate)
> > -			pci_bus_distribute_available_resources(dev->subordinate,
> > -				add_list, io, mmio, mmio_pref);
> > -		return;
> > -	}
> > -
> > -	if (hotplug_bridges == 0)
> > +	if (!(hotplug_bridges + normal_bridges))
> >  		return;
> >  
> >  	/*
> >  	 * Calculate the total amount of extra resource space we can
> > -	 * pass to bridges below this one.  This is basically the
> > -	 * extra space reduced by the minimal required space for the
> > -	 * non-hotplug bridges.
> > +	 * pass to bridges below this one.
> 
> The "space we can pass to bridges below this one" is the existing
> language, but I think it's a little too restrictive.  A bridge
> forwards space to the secondary bus, where it's available for any
> device, not just bridges.
> 
> Maybe:
> 
>   Calculate the amount of space we can forward from "bus" to any
>   downstream buses, i.e., the space left over after assigning the BARs
>   and windows on "bus".

Okay.

> > +	   This is basically the extra
> > +	 * space reduced by the minimal required space for the bridge
> > +	 * windows and device BARs on this bus.
> >  	 */
> > -	for_each_pci_bridge(dev, bus) {
> > -		resource_size_t used_size;
> > -		struct resource *res;
> > -
> > -		if (dev->is_hotplug_bridge)
> > -			continue;
> > -
> > -		/*
> > -		 * Reduce the available resource space by what the
> > -		 * bridge and devices below it occupy.
> > -		 */
> > -		res = &dev->resource[PCI_BRIDGE_IO_WINDOW];
> > -		align = pci_resource_alignment(dev, res);
> > -		align = align ? ALIGN(io.start, align) - io.start : 0;
> > -		used_size = align + resource_size(res);
> > -		if (!res->parent)
> > -			io.start = min(io.start + used_size, io.end + 1);
> > -
> > -		res = &dev->resource[PCI_BRIDGE_MEM_WINDOW];
> > -		align = pci_resource_alignment(dev, res);
> > -		align = align ? ALIGN(mmio.start, align) - mmio.start : 0;
> > -		used_size = align + resource_size(res);
> > -		if (!res->parent)
> > -			mmio.start = min(mmio.start + used_size, mmio.end + 1);
> > +	list_for_each_entry(dev, &bus->devices, bus_list)
> > +		reduce_dev_resources(dev, &io, &mmio, &mmio_pref);
> >  
> > -		res = &dev->resource[PCI_BRIDGE_PREF_MEM_WINDOW];
> > -		align = pci_resource_alignment(dev, res);
> > -		align = align ? ALIGN(mmio_pref.start, align) -
> > -			mmio_pref.start : 0;
> > -		used_size = align + resource_size(res);
> > -		if (!res->parent)
> > -			mmio_pref.start = min(mmio_pref.start + used_size,
> > -				mmio_pref.end + 1);
> > +	/*
> > +	 * If there is at least one hotplug bridge on this bus it gets
> > +	 * all the extra resource space that was left after the
> > +	 * reductions above.
> > +	 *
> > +	 * If there are no hotplug bridges the extra resource space is
> > +	 * split between non-hotplug bridges. This is to allow possible
> > +	 * hotplug bridges below them to get the extra space as well.
> 
> I guess "hotplug_bridges" only counts bridges directly on "bus",
> right?  I.e., we don't look deeper in the hierarchy for downstream
> hotplug bridges?

Yes correct.

> What happens in a topology like this:
> 
>   10:00.0 non-hotplug bridge to [bus 20-3f]
>   10:01.0 non-hotplug bridge to [bus 40]
>   20:00.0 hotplug bridge
>   40:00.0 NIC
> 
> where we're distributing space on "bus" 10, hotplug_bridges == 0 and
> normal_bridges == 2?  Do we give half the extra space to bus 20 and
> the other half to bus 40, even though we could tell up front that bus
> 20 is the only place that can actually use any extra space?

Yes we split it into half.



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux