On Sat, Aug 21, 2021 at 5:03 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > On Thu, Aug 19, 2021 at 05:45:22PM +0200, Heiner Kallweit wrote: > > On 19.08.2021 13:42, Bjorn Helgaas wrote: > > > On Thu, Aug 19, 2021 at 01:45:40PM +0800, Kai-Heng Feng wrote: > > >> r8169 NICs on some platforms have abysmal speed when ASPM is enabled. > > >> Same issue can be observed with older vendor drivers. > > > > > > On some platforms but not on others? Maybe the PCIe topology is a > > > factor? Do you have bug reports with data, e.g., "lspci -vv" output? > > > > > >> The issue is however solved by the latest vendor driver. There's a new > > >> mechanism, which disables r8169's internal ASPM when the NIC traffic has > > >> more than 10 packets, and vice versa. > > > > > > Presumably there's a time interval related to the 10 packets? For > > > example, do you want to disable ASPM if 10 packets are received (or > > > sent?) in a certain amount of time? > > > > > >> The possible reason for this is > > >> likely because the buffer on the chip is too small for its ASPM exit > > >> latency. > > > > > > Maybe this means the chip advertises incorrect exit latencies? If so, > > > maybe a quirk could override that? > > > > > >> Realtek confirmed that all their PCIe LAN NICs, r8106, r8168 and r8125 > > >> use dynamic ASPM under Windows. So implement the same mechanism here to > > >> resolve the issue. > > > > > > What exactly is "dynamic ASPM"? > > > > > > I see Heiner's comment about this being intended only for a downstream > > > kernel. But why? > > > > > We've seen various more or less obvious symptoms caused by the broken > > ASPM support on Realtek network chips. Unfortunately Realtek releases > > neither datasheets nor errata information. > > Last time we attempted to re-enable ASPM numerous problem reports came > > in. These Realtek chips are used on basically every consumer mainboard. > > The proposed workaround has potential side effects: In case of a > > congestion in the chip it may take up to a second until ASPM gets > > disabled, what may affect performance, especially in case of alternating > > traffic patterns. Also we can't expect support from Realtek. > > Having said that my decision was that it's too risky to re-enable ASPM > > in mainline even with this workaround in place. Kai-Heng weights the > > power saving higher and wants to take the risk in his downstream kernel. > > If there are no problems downstream after few months, then this > > workaround may make it to mainline. > > Since ASPM apparently works well on some platforms but not others, I'd > suspect some incorrect exit latencies. Can be, but if their dynamic ASPM mechanism can workaround the issue, maybe their hardware is just designed that way? > > Ideally we'd have some launchpad/bugzilla links, and a better > understanding of the problem, and maybe a quirk that makes this work > on all platforms without mucking up the driver with ASPM tweaks. The tweaks is OS-agnostic and is also implemented in Windows. > > But I'm a little out of turn here because the only direct impact to > the PCI core is the pcie_aspm_supported() interface. It *looks* like > these patches don't actually touch the PCIe architected ASPM controls > in Link Control; all I see is mucking with Realtek-specific registers. AFAICT, Realtek ethernet NIC and wireless NIC both have two layers of ASPM, one is the regular PCIe ASPM, and a Realtek specific internal ASPM. Both have to be enabled to really make ASPM work for them. Kai-Heng > > I think this is more work than it should be and likely to be not as > reliable as it should be. But I guess that's up to you guys. > > Bjorn