[+cc Emmanuel, Rafael, Heiner, ancient ASPM history] On Thu, May 11, 2023 at 10:58:40PM +0300, Ilpo Järvinen wrote: > On Thu, 11 May 2023, Bjorn Helgaas wrote: > > On Thu, May 11, 2023 at 08:35:48PM +0300, Ilpo Järvinen wrote: > > > On Thu, 11 May 2023, Bjorn Helgaas wrote: > > > > On Thu, May 11, 2023 at 04:14:25PM +0300, Ilpo Järvinen wrote: > > > > > A few places write LNKCTL and LNKCTL2 registers without proper > > > > > concurrency control and this could result in losing the changes > > > > > one of the writers intended to make. > > > > > > > > > > Add pcie_capability_clear_and_set_word_locked() and helpers to use it > > > > > with LNKCTL and LNKCTL2. The concurrency control is provided using a > > > > > spinlock in the struct pci_dev. > ... [beginning of thread is https://lore.kernel.org/r/20230511131441.45704-1-ilpo.jarvinen@xxxxxxxxxxxxxxx; context here is that several drivers clear ASPM config directly, probably because pci_disable_link_state() doesn't always do it] > > Many of these are ASPM-related updates that IMHO should not be in > > drivers at all. Drivers should use PCI core interfaces so the core > > doesn't get confused. > > Ah, yes. I forgot to mention it in the cover letter but I noticed that > some of those seem to be workarounds for the cases where core refuses to > disable ASPM. Some sites even explicit have a comment about that after > the call to pci_disable_link_state(): > > static void bcm4377_disable_aspm(struct bcm4377_data *bcm4377) > { > pci_disable_link_state(bcm4377->pdev, > PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1); > > /* > * pci_disable_link_state can fail if either CONFIG_PCIEASPM is disabled > * or if the BIOS hasn't handed over control to us. We must *always* > * disable ASPM for this device due to hardware errata though. > */ > pcie_capability_clear_word(bcm4377->pdev, PCI_EXP_LNKCTL, > PCI_EXP_LNKCTL_ASPMC); > } > > That kinda feels something that would want a force disable quirk that is > reliable. There are quirks for some devices which try to disable it but > could fail for reasons mentioned in that comment. (But I'd prefer to make > another series out of it rather than putting it into this one.) > > It might even be that some drivers don't even bother to make the > pci_disable_link_state() call because it isn't reliable enough. Yeah, I noticed that this is problematic. We went round and round about this ten years ago [1], which resulted in https://git.kernel.org/linus/2add0ec14c25 ("PCI/ASPM: Warn when driver asks to disable ASPM, but we can't do it"). I'm not 100% convinced by that anymore. It's true that if firmware retains control of the PCIe capability, the OS is technically not allowed to write to it, and it's conceivable that even a locked OS update could collide with some SMI or something that also writes to it. I can certainly imagine that firmware might know that *enabling* ASPM might break because of signal integrity issues or something. It seems less likely that *disabling* ASPM would break something, but Rafael [2] and Matthew [3] rightly pointed out that there is some risk. But the current situation, where pci_disable_link_state() does nothing if CONFIG_PCIEASPM is unset or if _OSC says firmware owns it, leads to drivers doing it directly anyway. I'm not sure that's better than making pci_disable_link_state() work 100% of the time, regardless of CONFIG_PCIEASPM and _OSC. At least then the PCI core would know what's going on. Bjorn [1] https://lore.kernel.org/all/CANUX_P3F5YhbZX3WGU-j1AGpbXb_T9Bis2ErhvKkFMtDvzatVQ@xxxxxxxxxxxxxx/ [2] https://lore.kernel.org/all/1725435.3DlCxYF2FV@xxxxxxxxxxxxxx/ [3] https://lore.kernel.org/all/1368303730.2425.47.camel@x230/