On Wed, Nov 01, 2023 at 08:14:31PM -0500, Mario Limonciello wrote: > On 11/1/2023 17:52, Bjorn Helgaas wrote: > > On Tue, Oct 31, 2023 at 08:34:38AM -0500, Mario Limonciello wrote: > > > The USB4 spec specifies that PCIe ports that are used for tunneling > > > PCIe traffic over USB4 fabric will be hardcoded to advertise 2.5GT/s. > > > > > > In reality these ports speed is controlled by the fabric implementation. > > > > So I guess you're saying the speed advertised by PCI_EXP_LNKSTA is not > > the actual speed? And we don't have a generic way to find the actual > > speed? > > Correct. > > > > Downstream drivers such as amdgpu which utilize pcie_bandwidth_available() > > > to program the device will always find the PCIe ports used for > > > tunneling as a limiting factor and may make incorrect decisions. > > > > > > To prevent problems in downstream drivers check explicitly for ports > > > being used for PCIe tunneling and skip them when looking for bandwidth > > > limitations. > > > > > > 2 types of devices are detected: > > > 1) PCIe root port used for PCIe tunneling > > > 2) Intel Thunderbolt 3 bridge > > > > > > Downstream drivers could make this change on their own but then they > > > wouldn't be able to detect other potential speed bottlenecks. > > > > Is the implication that a tunneling port can *never* be a speed > > bottleneck? That seems to be how this patch would work in practice. > > I think that's a stretch we should avoid concluding. I'm just reading the description and the patch, which seem to say that pcie_bandwidth_available() will never report a tunneling port as the limiting port. Maybe this can be rectified with a comment about how we can't tell the actual bandwidth of a tunneled port, and it may be a hidden unreported bottleneck, so pcie_bandwidth_available() can't actually return a reliable result. Seems sort of unsatisfactory, but ... I dunno, maybe it's the best we can do. > IIUC the fabric can be hosting other traffic and it's entirely possible the > traffic over the tunneling port runs more slowly at times. > > Perhaps that's why the the USB4 spec decided to advertise it this way? I > don't know. Maybe, although the same happens on shared PCIe links above switches. > > > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2925 > > > > This issue says the external GPU doesn't work at all. Does this patch > > fix that? This patch looks like it might improve GPU performance, but > > wouldn't fix something that didn't work at all. > > The issue actually identified 4 distinct different problems. The 3 problems > will be fixed in amdgpu which are functional. > > This performance one was from later in the ticket after some back and forth > identifying proper solutions for the first 3. There's a lot of material in that report. Is there a way to link to the specific part related to performance? > > > + * This function excludes root ports and bridges used for USB4 and TBT3 tunneling. Wrap to fit in 80 columns like the rest of the file. > > > */ > > > u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev, > > > enum pci_bus_speed *speed, > > > @@ -6254,6 +6290,10 @@ u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev, > > > bw = 0; > > > while (dev) { > > > + /* skip root ports and bridges used for tunneling */ > > > + if (pcie_is_tunneling_port(dev)) > > > + goto skip; > > > + > > > pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); > > > next_speed = pcie_link_speed[lnksta & PCI_EXP_LNKSTA_CLS]; > > > @@ -6274,6 +6314,7 @@ u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev, > > > *width = next_width; > > > } > > > +skip: > > > dev = pci_upstream_bridge(dev); > > > }