Re: [PATCH v5 2/9] PCI: Take bridge window alignment into account when distributing resources

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 26, 2018 at 03:23:33PM +0300, Mika Westerberg wrote:
> On Wed, Apr 25, 2018 at 05:38:54PM -0500, Bjorn Helgaas wrote:
> > On Mon, Apr 16, 2018 at 01:34:46PM +0300, Mika Westerberg wrote:
> > > When hot-adding a PCIe switch the way we currently distribute resources
> > > does not always work well because devices connected to the switch might
> > > need to have their MMIO resources aligned to something else than the
> > > default 1 MB boundary. For example Intel Gigabit ET2 quad port server
> > > adapter includes PCIe switch leading to 4 x GbE NIC devices that want
> > > to have their MMIO resources aligned to 2 MB boundary instead.
> > > 
> > > The current resource distribution code does not take this alignment into
> > > account and might try to add too much resources for the extension
> > > hotplug bridge(s). The resulting bridge window is too big which makes
> > > the resource assignment operation fail, and we are left with a bridge
> > > window with minimal amount (1 MB) of MMIO space.
> > > 
> > > Here is what happens when an Intel Gigabit ET2 quad port server adapter
> > > is hot-added:
> > > 
> > >   pci 0000:39:00.0: BAR 14: assigned [mem 0x53300000-0x6a0fffff]
> > >                                           ^^^^^^^^^^
> > >   pci 0000:3a:01.0: BAR 14: assigned [mem 0x53400000-0x547fffff]
> > >                                           ^^^^^^^^^^
> > > The above shows that the downstream bridge (3a:01.0) window is aligned
> > > to 2 MB instead of 1 MB as is the upstream bridge (39:00.0) window. The
> > > remaining MMIO space (0x15a00000) is assigned to the hotplug bridge
> > > (3a:04.0) but it fails:
> > > 
> > >   pci 0000:3a:04.0: BAR 14: no space for [mem size 0x15a00000]
> > >   pci 0000:3a:04.0: BAR 14: failed to assign [mem size 0x15a00000]
> > > 
> > > The MMIO resource is calculated as follows:
> > > 
> > >   start = 0x54800000
> > >   end = 0x54800000 + 0x15a00000 - 1 = 0x6a1fffff
> > > 
> > > This results bridge window [mem 0x54800000 - 0x6a1fffff] and it ends
> > > after the upstream bridge window [mem 0x53300000-0x6a0fffff] explaining
> > > the above failure. Because of this Linux falls back to the default
> > > allocation of 1 MB as can be seen from 'lspci' output:
> > > 
> > >  39:00.0 Memory behind bridge: 53300000-6a0fffff [size=366M]
> > >    3a:01.0 Memory behind bridge: 53400000-547fffff [size=20M]
> > >    3a:04.0 Memory behind bridge: 53300000-533fffff [size=1M]
> > > 
> > > The hotplug bridge 3a:04.0 only occupies 1 MB MMIO window which is
> > > clearly not enough for extending the PCIe topology later if more devices
> > > are to be hot-added.
> > > 
> > > Fix this by substracting properly aligned non-hotplug downstream bridge
> > > window size from the remaining resources used for extension. After this
> > > change the resource allocation looks like:
> > > 
> > >   39:00.0 Memory behind bridge: 53300000-6a0fffff [size=366M]
> > >     3a:01.0 Memory behind bridge: 53400000-547fffff [size=20M]
> > >     3a:04.0 Memory behind bridge: 54800000-6a0fffff [size=345M]
> > > 
> > > This matches the expectation. All the extra MMIO resource space (345 MB)
> > > is allocated to the extension hotplug bridge (3a:04.0).
> > 
> > Sorry, I've spent a lot of time trying to trace through this code, and
> > I'm still hopelessly confused.  Can you post the complete "lspci -vv"
> > output and the dmesg log (including the hot-add event) somewhere and
> > include a URL to it?
> 
> I sent you the logs and lspci output both with and without this patch
> when I connect a full chain of 6 Thunderbolt devices where 3 of them
> include those NICs with 4 ethernet ports. The resulting topology
> includes total of 6 + 3 + 1 PCIe switches.

Thanks, I opened https://bugzilla.kernel.org/show_bug.cgi?id=199581
and attached the info you sent.

> > I think I understand the problem you're solving:
> > 
> >   - You have 366M, 1M-aligned, available for things on bus 3a
> >   - You assign 20M, 2M-aligned to 3a:01.0
> >   - This leaves 346M for other things on bus 3a, but it's not all
> >     contiguous because the 20M is in the middle.
> >   - The remaining 346M might be 1M on one side and 345M on the other
> >     (and there are many other possibilities, e.g., 3M + 343M, 5M +
> >     341M, ..., 345M + 1M).
> >   - The current code tries to assign all 346M to 3a:04.0, which
> >     fails because that space is not contiguous, so it falls back to
> >     allocating 1M, which works but is insufficient for future
> >     hot-adds.
> 
> My understanding is that the 20M is aligned to 2M so we need to take
> that into account when we distribute the remaining space which makes it
> 345 instead of 346 which it would be without the alignment.

I think that's what I said above, or did I miss something?

> > Obviously this patch makes *this* situation work: it assigns 345M to
> > 3a:04.0 and (I assume) leaves the 1M unused.  But I haven't been able
> > to convince myself that this patch works *in general*.
> 
> I've tested this patch with full chain of devices with all my three
> Intel Gigabit ET2 quad port server adapters connected there along with
> other devices and the issue does not happen.
> 
> > For example, what if we assigned the 20M from the end of the 366M
> > window instead of the beginning, so the 345M piece is below the 20M
> > and there's 1M left above it?  That is legal and should work, but I
> > suspect this patch would ignore the 345M piece and again assign 1M to
> > 3a:04.0.
> 
> It should work so that it first allocates resources for the non-hotplug
> bridges and after that everything else is put to hotplug bridges.
> 
> > Or what if there are several hotplug bridges on bus 3a?  This example
> > has two, but there could be many more.
> > 
> > Or what if there are normal bridges as well as hotplug bridges on bus
> > 3a?  Or if they're in arbitrary orders?
> 
> Thunderbolt host router with two ports has such configuration where
> there are two hotplug ports and two normal ports (there could be more)
> and it is hot-added as well. At least that works. With the other
> arbitrary scenarios, it is hard to say without actually testing it on a
> real hardware.

This is where it gets hard for me -- I'm not really comfortable if we
have to convince ourselves that code is correct by testing every
scenario.  It's a lot better if we can convince ourselves by reasoning
about what the code does.  That's not very reliable either, but if we
understand the code, we at least have a hope of being able to fix the
bugs we missed in our reasoning.

> Also I'm fine dropping this patch altogether and just file a kernel
> bugzilla with this information attached. Maybe someone else can provide
> a better fix eventually. This is not really common situation anyway
> because typically you have only PCIe endpoints included in a Thunderbolt
> device (not PCIe switches with a bunch of endpoints connected).
> Furthermore, I tried the same in Windows and it does not handle it
> properly either ;-)

OK, I opened the bugzilla and attached the info.  Thanks!



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux