On Sat, Feb 02, 2019 at 04:25:02PM +0000, Nicholas Johnson wrote: > New systems with Thunderbolt are starting to use native PCI enumeration. To clarify, I assume "native PCI enumeration" means platform firmware is not involved in hotplug, which means we would use pciehp (not acpiphp) for these Thunderbolt hotplug events. > Mika Westerberg's patch "PCI: Distribute available resources to hotplug > capable PCIe downstream ports" > (https://patchwork.kernel.org/patch/9972155/) adds code to expand > downstream PCI hotplug bridges to consume all remaining resource space > in the parent bridge. It is a crucial patch for supporting Thunderbolt > native enumeration on Linux. > > However, it does not consider bridge alignment in all cases, which rules > out hot-adding an external graphics processor at the end of a > Thunderbolt daisy chain. Hot-adding such a device will likely fail to > work with the existing code. It also might disrupt the operation of > existing devices Have you observed disruption of existing devices when hot-adding a new device? That's a much more serious problem than if the new device doesn't work. It will always be possible that we don't have enough resources for a hot-added device, but it should never break a previously working device. If you've seen breakage of an existing device, I'd like to start with the minimum patch that solves only *that*, with following patches that improve the likelihood that hot-added devices will work. It's always good to break up patches into the smallest logical units (as long as none introduces a regression, i.e., after each patch, the kernel must build and work as well as before). That minimum patch may be a candidate for backporting, while the new functionality (improving support for hot-added devices) may not be. > or prevent the subsequent hot-plugging of lower aligned > devices if the kernel frees and reallocates upstream resources to > attempt assign the resources that failed to assign in the first pass. By > Intel's ruling, Thunderbolt external graphics processors are generally > meant to function only as the first and only device in the chain. Do you have a reference for this? Some background on this would be really useful in case those restrictions have any bearing on Linux. > However, there is no technical reason that prevents it from being > possible if sufficient resources are available, and people are likely to > attempt such configurations with Thunderbolt devices if they own such > hardware. Hence, I argue that we should improve the user experience and > reduce the number of constraints placed on the user by always > considering resource alignment, which will make such configurations > possible. > > The other problem with the patch is that it is incompatible with > resources allocated by "pci=hpmemsize=nnM" due to a check which does not > allow for dev_res->add_size to be reduced. This check also makes a big > assumption that the hpmemsize is small but non-zero, and no action has > been taken to ensure that. In the current state, any bridge smaller than > hpmemsize will likely fail to size correctly, which will cause major > issues if the default were to change, or if the user also wants to > configure non-Thunderbolt hotplug bridges simultaneously. I argue that > if assumptions and limitations can be removed with no risks and adverse > effects, then it should be done. > > The former problem is solved by rewriting the > pci_bus_distribute_available_resources() function with more information > passed in the arguments, eliminating assumptions about the initial > bridge alignment. My patch makes no assumptions about the alignment of > hot-added bridges, allowing for any device to be hot-added, given > sufficient resources are available. > > The latter problem is solved by removing the check preventing the > shrinking of dev_res->add_size, which allows for the distribution of > resources if hpmemsize is non-zero. It can be made to work with zero > hpmemsize with two-line patches in pbus_size_mem() and pbus_size_io(), > or by modifying extend_bridge_window() to add a new entry to the > additional resource list if one does not exist. In theory, and by my > testing, the removal of this check does not affect the functionality of > the resource distribution with firmware-allocated resources. But it does > enable the same functionality when using pci=hpmemsize=nnM, which was > not possible prior. This may be one piece of the puzzle needed to > support Thunderbolt add-in cards that support native PCI enumeration, > without any platform dependencies. You mention two problems with (apparently) two solutions. If it's possible to separate these into two (or more) patches, please do that. > I have tested this proposed patch extensively. Using Linux-allocated > resources, it works perfectly. I have two Gigabyte GC-TITAN RIDGE > Thunderbolt 3 add-in cards in my desktop, and a Dell XPS 9370 with the > Dell XPS 9380 Thunderbolt NVM40 firmware flashed. My peripherals are > three HP ZBook Thunderbolt 3 docks, two external graphics enclosures > with AMD Fiji XT in both, a Promise SANLink3 N1 (AQC107S), and a Dell > Thunderbolt 3 NVMe SSD. All configurations of these devices worked well, > and I can no longer make it fail if I try to. My testing with > firmware-allocated resources is limited due to using computers with > Alpine Ridge BIOS support. However, I did get manage to test the patch > with firmware-allocated resources by enabling the Alpine Ridge BIOS > support and forcing pcie_ports=native, and the results were perfect. > > Mika Westerberg has agreed to test this on an official platform with > native enumeration firmware support to be sure that it works in this > situation. It is also appropriate that he reviews these changes as he > wrote the original code and any changes made to Thunderbolt-critical > code cannot be allowed to break any of the base requirements for > Thunderbolt. I would like to thank him for putting up with my emails and > trying to help where he can, and for the great work he has done on > Thunderbolt in Linux. > > I have more patches in the pipeline to further improve the Thunderbolt > experience on Linux on platforms without BIOS support. This is the most > technical but least user-facing one planned. The most exciting changes > are yet to come. Some of this text, especially the previous two paragraphs and the following text, would better fit in a [0/N] cover letter because it will not be useful after this work is merged, i.e., it won't be interesting to people reading the git changelogs. > Edits: > > I have made code styling changes as suggested by Mika Westerberg. > > I have been testing Thunderbolt devices with my other host card which > happens to be in SL0 mode. This means that devices are discovered much > more quickly. I noticed that multiple devices can be enumerated > together, rather than each getting enumerated before the next appears. > It turns out that this can break the allocation, but I have absolutely > no idea why. I have modified the patch to solve this problem. Before, > extend_bridge_window() used add_size to change the resource size. Now it > simply changes the size of the actual resource, and clears the add_size > to zero if a list entry exists. That solves the issue, and proves that > the calculated resource sizes are not at fault (the algorithm is sound). > Hence, I recommend this version with the modification be applied. > > I have removed Mika Westerberg's "Tested-By" line to allow him to give > his approval for this new change. > > Observation: the fact that a single Thunderbolt dock can consume 1/4 of > the total IO space (16-bit, 0xffff) is evidence that the depreciated IO > bars need to be dropped from the kernel at some point. I/O space is still used by some devices, so we can't drop it completely. We can certainly improve Linux behavior when it's exhausted though. There are outstanding bugs in that area. > The docks have > four bridges each, with 4096-byte alignment. The IO BARs do not do > anything, crash the system if not claimed and spam dmesg when not > assigned. > > Signed-off-by: Nicholas-Johnson-opensource <nicholas.johnson-opensource@xxxxxxxxxxxxxx> This part ------ ^^^^^^^^^^^^^^^^^^^^^^^^^^^ should be your proper name. I assume your name is "Nicholas Johnson", not "Nicholas-Johnson-opensource"? > --- > drivers/pci/setup-bus.c | 188 +++++++++++++++++++++------------------- > 1 file changed, 99 insertions(+), 89 deletions(-) > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c > index ed960436df5e..09310b6fcdb3 100644 > --- a/drivers/pci/setup-bus.c > +++ b/drivers/pci/setup-bus.c > @@ -1859,27 +1859,34 @@ static void extend_bridge_window(struct pci_dev *bridge, struct resource *res, > if (res->parent) > return; > > - if (resource_size(res) >= available) > - return; > + /* > + * Hot-adding multiple Thunderbolt devices in SL0 might result in > + * multiple devices being enumerated together. This can break the > + * resource allocation if the resource sizes are specified with > + * add_size instead of simply changing the resource size. > + */ > + pci_dbg(bridge, "bridge window %pR extended by 0x%016llx\n", res, > + available - resource_size(res)); > + res->end = res->start + available - 1; > > + /* > + * If a list entry exists, we need to remove any additional size > + * requested because that could interfere with the alignment and > + * sizing done when distributing resources, causing resources to > + * fail to allocate later on. > + */ > dev_res = res_to_dev_res(add_list, res); > if (!dev_res) > return; > > - /* Is there room to extend the window? */ > - if (available - resource_size(res) <= dev_res->add_size) > - return; > - > - dev_res->add_size = available - resource_size(res); > - pci_dbg(bridge, "bridge window %pR extended by %pa\n", res, > - &dev_res->add_size); > + dev_res->add_size = 0; > } > > static void pci_bus_distribute_available_resources(struct pci_bus *bus, > - struct list_head *add_list, resource_size_t available_io, > - resource_size_t available_mmio, resource_size_t available_mmio_pref) > + struct list_head *add_list, struct resource io, > + struct resource mmio, struct resource mmio_pref) > { > - resource_size_t remaining_io, remaining_mmio, remaining_mmio_pref; > + resource_size_t io_per_hp, mmio_per_hp, mmio_pref_per_hp, align; > unsigned int normal_bridges = 0, hotplug_bridges = 0; > struct resource *io_res, *mmio_res, *mmio_pref_res; > struct pci_dev *dev, *bridge = bus->self; > @@ -1889,25 +1896,32 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus, > mmio_pref_res = &bridge->resource[PCI_BRIDGE_RESOURCES + 2]; > > /* > - * Update additional resource list (add_list) to fill all the > - * extra resource space available for this port except the space > - * calculated in __pci_bus_size_bridges() which covers all the > - * devices currently connected to the port and below. > + * The alignment of this bridge is yet to be considered, hence it must > + * be done now before extending its bridge window. A single bridge > + * might not be able to occupy the whole parent region if the alignment > + * differs - for example, an external GPU at the end of a Thunderbolt > + * daisy chain. > */ > - extend_bridge_window(bridge, io_res, add_list, available_io); > - extend_bridge_window(bridge, mmio_res, add_list, available_mmio); > - extend_bridge_window(bridge, mmio_pref_res, add_list, > - available_mmio_pref); > + align = pci_resource_alignment(bridge, io_res); > + if (!io_res->parent && align) > + io.start = ALIGN(io.start, align); > + > + align = pci_resource_alignment(bridge, mmio_res); > + if (!mmio_res->parent && align) > + mmio.start = ALIGN(mmio.start, align); > + > + align = pci_resource_alignment(bridge, mmio_pref_res); > + if (!mmio_pref_res->parent && align) > + mmio_pref.start = ALIGN(mmio_pref.start, align); > > /* > - * Calculate the total amount of extra resource space we can > - * pass to bridges below this one. This is basically the > - * extra space reduced by the minimal required space for the > - * non-hotplug bridges. > + * Update the resources to fill as much remaining resource space in the > + * parent bridge as possible, while considering alignment. > */ > - remaining_io = available_io; > - remaining_mmio = available_mmio; > - remaining_mmio_pref = available_mmio_pref; > + extend_bridge_window(bridge, io_res, add_list, resource_size(&io)); > + extend_bridge_window(bridge, mmio_res, add_list, resource_size(&mmio)); > + extend_bridge_window(bridge, mmio_pref_res, add_list, > + resource_size(&mmio_pref)); > > /* > * Calculate how many hotplug bridges and normal bridges there > @@ -1921,80 +1935,79 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus, > normal_bridges++; > } > > + /* > + * There is only one bridge on the bus so it gets all possible > + * resources which it can then distribute to the possible > + * hotplug bridges below. > + */ > + if (hotplug_bridges + normal_bridges == 1) { > + dev = list_first_entry(&bus->devices, struct pci_dev, bus_list); > + if (dev->subordinate) > + pci_bus_distribute_available_resources(dev->subordinate, > + add_list, io, mmio, mmio_pref); > + return; > + } > + > + /* > + * Reduce the available resource space by what the > + * bridge and devices below it occupy. > + */ > for_each_pci_bridge(dev, bus) { > - const struct resource *res; > + struct resource *res; > + resource_size_t used_size; > > if (dev->is_hotplug_bridge) > continue; > > - /* > - * Reduce the available resource space by what the > - * bridge and devices below it occupy. > - */ > res = &dev->resource[PCI_BRIDGE_RESOURCES + 0]; > - if (!res->parent && available_io > resource_size(res)) > - remaining_io -= resource_size(res); > + align = pci_resource_alignment(dev, res); > + align = align ? ALIGN(io.start, align) - io.start : 0; > + used_size = align + resource_size(res); > + if (!res->parent && used_size <= resource_size(&io)) > + io.start += used_size; > > res = &dev->resource[PCI_BRIDGE_RESOURCES + 1]; > - if (!res->parent && available_mmio > resource_size(res)) > - remaining_mmio -= resource_size(res); > + align = pci_resource_alignment(dev, res); > + align = align ? ALIGN(mmio.start, align) - mmio.start : 0; > + used_size = align + resource_size(res); > + if (!res->parent && used_size <= resource_size(&mmio)) > + mmio.start += used_size; > > res = &dev->resource[PCI_BRIDGE_RESOURCES + 2]; > - if (!res->parent && available_mmio_pref > resource_size(res)) > - remaining_mmio_pref -= resource_size(res); > + align = pci_resource_alignment(dev, res); > + align = align ? ALIGN(mmio_pref.start, align) - > + mmio_pref.start : 0; > + used_size = align + resource_size(res); > + if (!res->parent && used_size <= resource_size(&mmio_pref)) > + mmio_pref.start += used_size; > } > > - /* > - * There is only one bridge on the bus so it gets all available > - * resources which it can then distribute to the possible > - * hotplug bridges below. > - */ > - if (hotplug_bridges + normal_bridges == 1) { > - dev = list_first_entry(&bus->devices, struct pci_dev, bus_list); > - if (dev->subordinate) { > - pci_bus_distribute_available_resources(dev->subordinate, > - add_list, available_io, available_mmio, > - available_mmio_pref); > - } > + if (!hotplug_bridges) > return; > - } > > /* > - * Go over devices on this bus and distribute the remaining > - * resource space between hotplug bridges. > + * Distribute any remaining resources equally between > + * the hotplug-capable downstream ports. > */ > - for_each_pci_bridge(dev, bus) { > - resource_size_t align, io, mmio, mmio_pref; > - struct pci_bus *b; > + io_per_hp = div64_ul(resource_size(&io), hotplug_bridges); > + mmio_per_hp = div64_ul(resource_size(&mmio), hotplug_bridges); > + mmio_pref_per_hp = div64_ul(resource_size(&mmio_pref), > + hotplug_bridges); > > - b = dev->subordinate; > - if (!b || !dev->is_hotplug_bridge) > + for_each_pci_bridge(dev, bus) { > + if (!dev->subordinate || !dev->is_hotplug_bridge) > continue; > > - /* > - * Distribute available extra resources equally between > - * hotplug-capable downstream ports taking alignment into > - * account. > - * > - * Here hotplug_bridges is always != 0. > - */ > - align = pci_resource_alignment(bridge, io_res); > - io = div64_ul(available_io, hotplug_bridges); > - io = min(ALIGN(io, align), remaining_io); > - remaining_io -= io; > + io.end = io.start + io_per_hp - 1; > + mmio.end = mmio.start + mmio_per_hp - 1; > + mmio_pref.end = mmio_pref.start + mmio_pref_per_hp - 1; > > - align = pci_resource_alignment(bridge, mmio_res); > - mmio = div64_ul(available_mmio, hotplug_bridges); > - mmio = min(ALIGN(mmio, align), remaining_mmio); > - remaining_mmio -= mmio; > + pci_bus_distribute_available_resources(dev->subordinate, > + add_list, io, mmio, mmio_pref); > > - align = pci_resource_alignment(bridge, mmio_pref_res); > - mmio_pref = div64_ul(available_mmio_pref, hotplug_bridges); > - mmio_pref = min(ALIGN(mmio_pref, align), remaining_mmio_pref); > - remaining_mmio_pref -= mmio_pref; > - > - pci_bus_distribute_available_resources(b, add_list, io, mmio, > - mmio_pref); > + io.start = io.end + 1; > + mmio.start = mmio.end + 1; > + mmio_pref.start = mmio_pref.end + 1; > } > } > > @@ -2002,22 +2015,19 @@ static void > pci_bridge_distribute_available_resources(struct pci_dev *bridge, > struct list_head *add_list) > { > - resource_size_t available_io, available_mmio, available_mmio_pref; > - const struct resource *res; > + struct resource io_res, mmio_res, mmio_pref_res; > > if (!bridge->is_hotplug_bridge) > return; > > + io_res = bridge->resource[PCI_BRIDGE_RESOURCES + 0]; > + mmio_res = bridge->resource[PCI_BRIDGE_RESOURCES + 1]; > + mmio_pref_res = bridge->resource[PCI_BRIDGE_RESOURCES + 2]; > + > /* Take the initial extra resources from the hotplug port */ > - res = &bridge->resource[PCI_BRIDGE_RESOURCES + 0]; > - available_io = resource_size(res); > - res = &bridge->resource[PCI_BRIDGE_RESOURCES + 1]; > - available_mmio = resource_size(res); > - res = &bridge->resource[PCI_BRIDGE_RESOURCES + 2]; > - available_mmio_pref = resource_size(res); > > pci_bus_distribute_available_resources(bridge->subordinate, > - add_list, available_io, available_mmio, available_mmio_pref); > + add_list, io_res, mmio_res, mmio_pref_res); > } > > void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge) > -- > 2.19.1 >