Heiner Kallweit wrote on 4/3/19 12:45 PM:
On 03.04.2019 15:14, Bjorn Helgaas wrote:
[+cc Frederick]
On Wed, Apr 03, 2019 at 07:53:40AM +0200, Heiner Kallweit wrote:
On 02.04.2019 23:57, Bjorn Helgaas wrote:
On Tue, Apr 02, 2019 at 10:41:20PM +0200, Heiner Kallweit wrote:
On 02.04.2019 22:16, Florian Fainelli wrote:
On 4/2/19 12:55 PM, Heiner Kallweit wrote:
There are numerous reports about different problems caused by ASPM
incompatibilities between certain network chip versions and board
chipsets. On the other hand on (especially mobile) systems where ASPM
works properly it can significantly contribute to power-saving and
increased battery runtime.
One problem so far to make ASPM configurable was to find an acceptable
way of configuration (e.g. module parameters are discouraged for that
purpose).
As a new attempt let's switch off ASPM per default and make it
configurable by a sysfs attribute. The attribute is documented in
new file Documentation/networking/device_drivers/realtek/r8169.txt.
Both module parameters and sysfs attributes are a poor user
experience. It's very difficult for users to figure out that
a tweak is needed.
I am not sure this is where it should be solved, there is
definitively a device specific aspect to properly supporting the
enabling of ASPM L0s, L1s etc, but the actual sysfs knobs should
belong to the pci_device itself, since this is something that
likely other drivers would want to be able to expose. You would
probably want to work with the PCI maintainers to come up with a
standard solution that applies beyond just r8169 since presumably
there must be a gazillion of devices with the same issues.
The Linux PCI core support for ASPM is poor. Without more details,
it's impossible to tell whether these issues are hardware or firmware
defects on the device itself, or something that Linux is doing wrong.
There are several known defects, especially related to L1 substates
and hotplug.
The vendor refuses to release datasheets or errata. Only certain
combinations of board chipsets (and maybe BIOS versions) and network
chip versions (from the ~ 50 supported by the driver) seem to be
affected. One typical symptom is missed RX packets, maybe the RX FIFO
isn't big enough to buffer all packets until PCIe has woken up.
The Windows vendor driver uses a hack, they dynamically disable ASPM
under load.
I'm not super sympathetic to vendors like that or to OEMs that work
with them. If we can make the NIC work reliably by disabling ASPM,
that's step one. If we can figure out how to extend battery life by
enabling ASPM in some cases, great, but we have to be careful to do it
in a way that is supportable and doesn't generate lots of user
complaints that require debugging.
That said, I think Frederick has already started working on a plan for
the PCI core to expose sysfs files to manage ASPM. This is similar to
the link_state files enabled by CONFIG_PCIEASPM_DEBUG, but it will be
always enabled and probably structured slightly differently. The idea
is that this would be generic and would not require any driver
support.
Thanks, Bjorn!
Frederick, is there anything you could share already? Or any timeline?
Heiner,
I've been on hold for this pet project for a bit. If you're willing to
take this one on, all the merrier :)
The plan is the sysfs files would sit on the endpoint devices, and given
a supported state value, configure the link(s) for the endpoints
upstream port to its immediate root port or switch downstream port.
Then, this allows something else handle the walking of the tree via the
ABI. We don't necessarily want to configure sibling links under a switch.
I'm currently traveling, and I can't get you patches until this weekend.
Based on Bjorns info what seems to be best to me:
1. Disable ASPM for r8169 on stable (back to 4.19).
2. Once the generic ASPM sysfs attributes are available, reenable ASPM
for r8169 in net-next.
Bjorn
Heiner
Frederick Lawler