On Fri, Oct 01, 2021 at 12:17:26PM +0800, Kai-Heng Feng wrote: > On Sat, Sep 18, 2021 at 6:09 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > On Thu, Sep 16, 2021 at 11:44:14PM +0800, Kai-Heng Feng wrote: > > > The purpose of the series is to get comments and reviews so we can merge > > > and test the series in downstream kernel. > > > > > > The latest Realtek vendor driver and its Windows driver implements a > > > feature called "dynamic ASPM" which can improve performance on it's > > > ethernet NICs. > > > > > > Heiner Kallweit pointed out the potential root cause can be that the > > > buffer is too small for its ASPM exit latency. > > > > I looked at the lspci data in your bugzilla > > (https://bugzilla.kernel.org/show_bug.cgi?id=214307). > > > > L1.2 is enabled, which requires the Latency Tolerance Reporting > > capability, which helps determine when the Link will be put in L1.2. > > IIUC, these are analogous to the DevCap "Acceptable Latency" values. > > Zero latency values indicate the device will be impacted by any delay > > (PCIe r5.0, sec 6.18). > > > > Linux does not currently program those values, so the values there > > must have been set by the BIOS. On the working AMD system, they're > > set to 1048576ns, while on the broken Intel system, they're set to > > 3145728ns. > > > > I don't really understand how these values should be computed, and I > > think they depend on some electrical characteristics of the Link, so > > I'm not sure it's *necessarily* a problem that they are different. > > But a 3X difference does seem pretty large. > > > > So I'm curious whether this is related to the problem. Here are some > > things we could try on the broken Intel system: > > Original network speed, tested via iperf3: > TX: ~255 Mbps > RX: ~490 Mbps > > > - What happens if you disable ASPM L1.2 using > > /sys/devices/pci*/.../link/l1_2_aspm? > > TX: ~670 Mbps > RX: ~670 Mbps Do you remember if there were any dropped packets here? You mentioned at [1] that you have also seen reports of issues with L0s and L1.1. If you disable L1.2, L0s and L1.1 *should* still be enabled. [1] https://lore.kernel.org/r/CAAd53p4v+CmupCu2+3vY5N64WKkxcNvpk1M7+hhNoposx+aYCg@xxxxxxxxxxxxxx