On Mon, Oct 25, 2021 at 06:45:29PM +0200, Jonas Dreßler wrote: > On 10/18/21 17:35, Bjorn Helgaas wrote: > > On Thu, Oct 14, 2021 at 12:08:31AM +0200, Jonas Dreßler wrote: > > > On 10/12/21 17:39, Bjorn Helgaas wrote: > > > > [+cc Vidya, Victor, ASPM L1.2 config issue; beginning of thread: > > > > https://lore.kernel.org/all/20211011134238.16551-1-verdre@xxxxxxx/] > > > > > > I wonder if this reset quirk works because pci_reset_function() saves > > > > and restores much of config space, but it currently does *not* restore > > > > the L1 PM Substates capability, so those T_POWER_ON, > > > > Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD values probably get > > > > cleared out by the reset. We did briefly save/restore it [1], but we > > > > had to revert that because of a regression that AFAIK was never > > > > resolved [2]. I expect we will eventually save/restore this, so if > > > > the quirk depends on it *not* being restored, that would be a problem. > > > > > > > > You should be able to test whether this is the critical thing by > > > > clearing those registers with setpci instead of doing the reset. Per > > > > spec, they can only be modified when L1.2 is disabled, so you would > > > > have to disable it via sysfs (for the endpoint, I think) > > > > /sys/.../l1_2_aspm and /sys/.../l1_2_pcipm, do the setpci on the root > > > > port, then re-enable L1.2. > > > > > > > > [1] https://git.kernel.org/linus/4257f7e008ea > > > > [2] https://lore.kernel.org/all/20210127160449.2990506-1-helgaas@xxxxxxxxxx/ > > > > > > Hmm, interesting, thanks for those links. > > > > > > Are you sure the config values will get lost on the reset? If we > > > only reset the port by going into D3hot and back into D0, the > > > device will remain powered and won't lose the config space, will > > > it? > > > > I think you're doing a PM reset (transition to D3hot and back to > > D0). Linux only does this when PCI_PM_CTRL_NO_SOFT_RESET == 0. > > The spec doesn't actually *require* the device to be reset; it > > only says the internal state of the device is undefined after > > these transitions. > > Not requiring the device to be reset sounds sensible to me given > that D3hot is what devices are transitioned into during suspend. > > But anyway, that doesn't really get us any further except it > somewhat gives an explanation why the LTR is suddenly 0 after the > reset. Or are you making the point that we shouldn't rely on > "undefined state" for this hack because not all PCI bridges/ports > will necessarily behave the same? I guess I'm just making the point that I don't understand why the bridge reset fixes something, and I'm not confident that the fix will work on every system and continue working even if/when the PCI core starts saving and restoring the L1 PM Substates capability.