[+cc Frederick] On Wed, Apr 03, 2019 at 07:53:40AM +0200, Heiner Kallweit wrote: > On 02.04.2019 23:57, Bjorn Helgaas wrote: > > On Tue, Apr 02, 2019 at 10:41:20PM +0200, Heiner Kallweit wrote: > >> On 02.04.2019 22:16, Florian Fainelli wrote: > >>> On 4/2/19 12:55 PM, Heiner Kallweit wrote: > >>>> There are numerous reports about different problems caused by ASPM > >>>> incompatibilities between certain network chip versions and board > >>>> chipsets. On the other hand on (especially mobile) systems where ASPM > >>>> works properly it can significantly contribute to power-saving and > >>>> increased battery runtime. > >>>> One problem so far to make ASPM configurable was to find an acceptable > >>>> way of configuration (e.g. module parameters are discouraged for that > >>>> purpose). > >>>> > >>>> As a new attempt let's switch off ASPM per default and make it > >>>> configurable by a sysfs attribute. The attribute is documented in > >>>> new file Documentation/networking/device_drivers/realtek/r8169.txt. > > > > Both module parameters and sysfs attributes are a poor user > > experience. It's very difficult for users to figure out that > > a tweak is needed. > > > >>> I am not sure this is where it should be solved, there is > >>> definitively a device specific aspect to properly supporting the > >>> enabling of ASPM L0s, L1s etc, but the actual sysfs knobs should > >>> belong to the pci_device itself, since this is something that > >>> likely other drivers would want to be able to expose. You would > >>> probably want to work with the PCI maintainers to come up with a > >>> standard solution that applies beyond just r8169 since presumably > >>> there must be a gazillion of devices with the same issues. > > > > The Linux PCI core support for ASPM is poor. Without more details, > > it's impossible to tell whether these issues are hardware or firmware > > defects on the device itself, or something that Linux is doing wrong. > > There are several known defects, especially related to L1 substates > > and hotplug. > > > The vendor refuses to release datasheets or errata. Only certain > combinations of board chipsets (and maybe BIOS versions) and network > chip versions (from the ~ 50 supported by the driver) seem to be > affected. One typical symptom is missed RX packets, maybe the RX FIFO > isn't big enough to buffer all packets until PCIe has woken up. > The Windows vendor driver uses a hack, they dynamically disable ASPM > under load. I'm not super sympathetic to vendors like that or to OEMs that work with them. If we can make the NIC work reliably by disabling ASPM, that's step one. If we can figure out how to extend battery life by enabling ASPM in some cases, great, but we have to be careful to do it in a way that is supportable and doesn't generate lots of user complaints that require debugging. That said, I think Frederick has already started working on a plan for the PCI core to expose sysfs files to manage ASPM. This is similar to the link_state files enabled by CONFIG_PCIEASPM_DEBUG, but it will be always enabled and probably structured slightly differently. The idea is that this would be generic and would not require any driver support. Bjorn