On Thu, Aug 19, 2021 at 05:45:22PM +0200, Heiner Kallweit wrote: > On 19.08.2021 13:42, Bjorn Helgaas wrote: > > On Thu, Aug 19, 2021 at 01:45:40PM +0800, Kai-Heng Feng wrote: > >> r8169 NICs on some platforms have abysmal speed when ASPM is enabled. > >> Same issue can be observed with older vendor drivers. > > > > On some platforms but not on others? Maybe the PCIe topology is a > > factor? Do you have bug reports with data, e.g., "lspci -vv" output? > > > >> The issue is however solved by the latest vendor driver. There's a new > >> mechanism, which disables r8169's internal ASPM when the NIC traffic has > >> more than 10 packets, and vice versa. > > > > Presumably there's a time interval related to the 10 packets? For > > example, do you want to disable ASPM if 10 packets are received (or > > sent?) in a certain amount of time? > > > >> The possible reason for this is > >> likely because the buffer on the chip is too small for its ASPM exit > >> latency. > > > > Maybe this means the chip advertises incorrect exit latencies? If so, > > maybe a quirk could override that? > > > >> Realtek confirmed that all their PCIe LAN NICs, r8106, r8168 and r8125 > >> use dynamic ASPM under Windows. So implement the same mechanism here to > >> resolve the issue. > > > > What exactly is "dynamic ASPM"? > > > > I see Heiner's comment about this being intended only for a downstream > > kernel. But why? > > > We've seen various more or less obvious symptoms caused by the broken > ASPM support on Realtek network chips. Unfortunately Realtek releases > neither datasheets nor errata information. > Last time we attempted to re-enable ASPM numerous problem reports came > in. These Realtek chips are used on basically every consumer mainboard. > The proposed workaround has potential side effects: In case of a > congestion in the chip it may take up to a second until ASPM gets > disabled, what may affect performance, especially in case of alternating > traffic patterns. Also we can't expect support from Realtek. > Having said that my decision was that it's too risky to re-enable ASPM > in mainline even with this workaround in place. Kai-Heng weights the > power saving higher and wants to take the risk in his downstream kernel. > If there are no problems downstream after few months, then this > workaround may make it to mainline. Since ASPM apparently works well on some platforms but not others, I'd suspect some incorrect exit latencies. Ideally we'd have some launchpad/bugzilla links, and a better understanding of the problem, and maybe a quirk that makes this work on all platforms without mucking up the driver with ASPM tweaks. But I'm a little out of turn here because the only direct impact to the PCI core is the pcie_aspm_supported() interface. It *looks* like these patches don't actually touch the PCIe architected ASPM controls in Link Control; all I see is mucking with Realtek-specific registers. I think this is more work than it should be and likely to be not as reliable as it should be. But I guess that's up to you guys. Bjorn