Hi Thomas, I had updated the bugzilla with a request to run some commands to collect more information from your system. Are you still able to work on this? Thanks. https://bugzilla.kernel.org/show_bug.cgi?id=216877#c27 David On Thu, 2023-10-05 at 17:57 +0200, Thomas Witt wrote: > On 05/10/2023 17:30, Bjorn Helgaas wrote: > > On Thu, Oct 05, 2023 at 12:02:58PM +0300, Mika Westerberg wrote: > > > On Wed, Oct 04, 2023 at 05:23:24PM -0500, Bjorn Helgaas wrote: > > > ... > > > > > The thing with TUXEDO is that on Thomas's system with mem_sleep=deep > > > this patch (without denylist) breaks the resume as he describes here: > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=216877 > > > > > > However, on exact same TUXEDO system with the same firmwares Werner is > > > not able to reproduce the issue with or without the patch. So I'm not > > > sure what to do and that's why I added denylist that should take effect > > > on Thomas' system when mem_sleep=deep is set but also work on the same > > > system without it. > > > > > > And since we have the denylist, I decided to add the ASUS there to avoid > > > accidentally breaking those too. > > > ... > > > > > > I think there's still something we're missing. > > > > > > > > We restore the LTR config before restoring DEVCTL2 (including the LTR > > > > enable bit) and L1SS state. I don't think we know the state of ASPM > > > > and L1SS at that point, do we? Do you think there could be an issue > > > > there, too? > > > > > > AFAICT LTR does not affect until it is explicitly enabled and I don't > > > think many drivers actually program it (although we have some sort of > > > API for it at least for Intel LPSS devices). > > > > I couldn't find anything in the spec that suggests LTR should be an > > issue. I'm just grasping at straws here. > > > > There's obviously *something* we're doing wrong because ASPM worked > > before suspend, so we should be able to get it to work after resume. > > > > Could we learn anything by dumping config space of the problem devices > > before/after the suspend/resume and comparing them? > > > > If we could figure out a difference between Werner's working TUXEDO > > and Thomas' non-working TUXEDO, that might be a hint, too. > > > > > If you have suggestions, I'm all open. If I understand you would like > > > this to be done like: > > > > > > - Drop the denylist > > > - Add back the suspend/restore of L1SS > > > - Ask everyone in this thread to try it out > > > > > > I can do that no problem but I guess that for the TUXEDO one (Thomas') > > > it probably is going to fail still. > > > > Right, without the denylist, I expect Thomas' TUXEDO to fail, but I > > still hope we can figure out why. If we just keep it on the denylist, > > that system will suffer from more power consumption than necessary, > > but only after suspend/resume. > > > > A denylist seems like the absolute last resort. In this case we don't > > know about anything *wrong* with those platforms; all we know is that > > our resume path doesn't work. It's likely that it fails on other > > platforms we haven't heard about, too. > > > > Bjorn > > The best guess from Mika and David was a firmware issue, but I run the > same Firmware revision as Werner. I even reflashed the Firmware, but > that did not change anything: > > Quoting David Box: > > I agree that we should pursue an exception for your system. This is > > looking like a firmware bug. One thing we did notice in the turbostat > > results is your IRTL (Interrupt Response Time Limit) values are bogus: > > > > cpu6: MSR_PKGC3_IRTL: 0x0000884e (valid, 79872 ns) > > cpu6: MSR_PKGC6_IRTL: 0x00008000 (valid, 0 ns) > > cpu6: MSR_PKGC7_IRTL: 0x00008000 (valid, 0 ns) > > cpu6: MSR_PKGC8_IRTL: 0x00008000 (valid, 0 ns) > > cpu6: MSR_PKGC9_IRTL: 0x00008000 (valid, 0 ns) > > cpu6: MSR_PKGC10_IRTL: 0x00008000 (valid, 0 ns) > > > > This is despite the PKGC configuration register showing that all > > states are enabled: > > > > cpu6: MSR_PKG_CST_CONFIG_CONTROL: 0x1e008008 (UNdemote-C3, > UNdemote-C1, demote- > C3, demote-C1, locked, pkg-cstate-limit=8 (unlimited)) > > > > Firmware sets this. > > Since I can't currently flash modified firmware on this computer, can > those values be overridden from userspace? >