On Mon, Dec 11, 2023 at 03:48:41PM -0600, Mario Limonciello wrote: > On 12/8/2023 16:44, Bjorn Helgaas wrote: > > On Fri, Dec 08, 2023 at 04:29:42PM -0600, Mario Limonciello wrote: > > > On 12/8/2023 16:24, Bjorn Helgaas wrote: > > > > On Wed, Aug 16, 2023 at 10:49:23AM +0530, Sanath S wrote: > > > > > In the case of Thunderbolt, it contains a PCIe switch and one or > > > > > more hotplug-capable PCIe downstream ports where the daisy chain > > > > > can be extended. > > > > > > > > > > Currently when a Thunderbolt Dock is plugged in during S5/Reboot, > > > > > System BIOS allocates a very minimal number of buses for bridges and > > > > > hot-plug capable PCIe downstream ports to enumerate the dock during > > > > > boot. Because of this, we run out of bus space pretty quickly when > > > > > more PCIe devices are attached to hotplug downstream ports in order > > > > > to extend the chain. > > > > > > > > > > Before: > > > > > +-04.0 > > > > > +-04.1-[63-c1]----00.0-[64-69]--+-00.0-[65]-- > > > > > | +-01.0-[66]-- > > > > > | +-02.0-[67]-- > > > > > | +-03.0-[68]-- > > > > > | \-04.0-[69]-- > > > > > +-08.0 > > > > > > > > Looks like a clear issue here because there's no other use for > > > > buses 70-c1. But what would happen if there were more > > > > hotplug-capable downstream ports, e.g., assume one at 08.1 > > > > leading to [bus c2-c7]? > > > > > > > > The 04.1 bridge has a lot of space, but 08.1 has very little. > > > > With this patch, would we distribute it more evenly across > > > > 04.1 and 08.1? If not, I think we'll just have the same > > > > problem when somebody plugs in a similar hierarchy at 08.1. > > > > > > > > > In case of a thunderbolt capable bridge, reconfigure the > > > > > buses allocated by BIOS to the maximum available buses. So > > > > > that the hot-plug bridges gets maximum buses and chain can > > > > > be extended to accommodate more PCIe devices. This fix is > > > > > necessary for all the PCIe downstream ports where the daisy > > > > > chain can be extended. > > > > > > > > > > After: > > > > > +-04.0 > > > > > +-04.1-[63-c1]----00.0-[64-c1]--+-00.0-[65]-- > > > > > | +-01.0-[66-84]-- > > > > > | +-02.0-[85-a3]-- > > > > > | +-03.0-[a4-c0]-- > > > > > | \-04.0-[c1]-- > > > > > +-08.0 > > > > > > > > This doesn't look like anything specific to Thunderbolt; it's just > > > > that we don't do a good job of reassigning bus numbers in general, > > > > right? We shouldn't just punt and say "BIOS should have done > > > > something" because not all machines *have* BIOS, and the OS can > > > > reconfigure bus numbers as needed. The patch certainly isn't > > > > Thunderbolt-specific. > > > > > > From the discussions Sanath and I have been in related to this issue > > > the BIOS is pretty static with it's initialization under the > > > presumption that the OS will rebalance things if necessary. > > > ... > > > > > For this particular issue it's being approached a different way. > > > > > > Windows never rebalances things but doesn't suffer from this issue. > > > That's because Windows actually does a "Downstream port reset" when > > > it encounters a USB4 router. > > > > > > Sanath posted a quirk that aligned this behavior when encountering > > > an AMD USB4 router, but as part of the discussion I suggested that > > > we do it for everyone. > > > > > > https://lore.kernel.org/linux-usb/20231123065739.GC1074920@xxxxxxxxxxxxxxxxxx/ > > > > > > So Sanath has a new patch that does this that is under testing right > > > now and will be posted soon. > > > > Hmm, ok. I don't know what a "downstream port reset" does or how it > > resolves the bus number allocation issue, but I'm happy if you have a > > fix that doesn't need PCI core changes. > > The issue is specifically with resources that were assigned with BIOS in > this "static case". The downstream port reset ends up resetting the > topology and thus the resources get assigned by Linux instead and will > be better balanced for more devices to be daisy chained. It sounds like the downstream port reset maybe just resets the bridge secondary/subordinate bus numbers, which forces Linux to reassign them? But Linux isn't smart enough to proactively reassign them? If so, the reset sounds a little like a band-aid, not a real fix, but I'm guessing nobody is signing up to rework that PCI core reassignment code ;) Bjorn