On 10/18/21 17:35, Bjorn Helgaas wrote:
On Thu, Oct 14, 2021 at 12:08:31AM +0200, Jonas Dreßler wrote:
On 10/12/21 17:39, Bjorn Helgaas wrote:
[+cc Vidya, Victor, ASPM L1.2 config issue; beginning of thread:
https://lore.kernel.org/all/20211011134238.16551-1-verdre@xxxxxxx/]
I wonder if this reset quirk works because pci_reset_function() saves
and restores much of config space, but it currently does *not* restore
the L1 PM Substates capability, so those T_POWER_ON,
Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD values probably get
cleared out by the reset. We did briefly save/restore it [1], but we
had to revert that because of a regression that AFAIK was never
resolved [2]. I expect we will eventually save/restore this, so if
the quirk depends on it *not* being restored, that would be a problem.
You should be able to test whether this is the critical thing by
clearing those registers with setpci instead of doing the reset. Per
spec, they can only be modified when L1.2 is disabled, so you would
have to disable it via sysfs (for the endpoint, I think)
/sys/.../l1_2_aspm and /sys/.../l1_2_pcipm, do the setpci on the root
port, then re-enable L1.2.
[1] https://git.kernel.org/linus/4257f7e008ea
[2] https://lore.kernel.org/all/20210127160449.2990506-1-helgaas@xxxxxxxxxx/
Hmm, interesting, thanks for those links.
Are you sure the config values will get lost on the reset? If we only reset
the port by going into D3hot and back into D0, the device will remain powered
and won't lose the config space, will it?
I think you're doing a PM reset (transition to D3hot and back to D0).
Linux only does this when PCI_PM_CTRL_NO_SOFT_RESET == 0. The spec
doesn't actually *require* the device to be reset; it only says the
internal state of the device is undefined after these transitions.
Not requiring the device to be reset sounds sensible to me given that
D3hot is what devices are transitioned into during suspend.
But anyway, that doesn't really get us any further except it somewhat
gives an explanation why the LTR is suddenly 0 after the reset. Or are
you making the point that we shouldn't rely on "undefined state" for this
hack because not all PCI bridges/ports will necessarily behave the same?