On 05.05.23 08:56, Koba Ko wrote: > On Thu, May 4, 2023 at 5:23 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: >> [+cc Koba, Ajay, Tasev, Mark, Thomas, regressions list] >> On Tue, Apr 11, 2023 at 03:42:29PM -0500, Bjorn Helgaas wrote: >>> On Tue, Apr 11, 2023 at 08:32:04AM +0000, bugzilla-daemon@xxxxxxxxxx wrote: >>>> https://bugzilla.kernel.org/show_bug.cgi?id=217321 >>>> ... >>>> Regression: No >>>> >>>> [Symptom] >>>> Intel cpu can't sleep deeper than pcˇ during long idle >>>> ~~~ >>>> Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10 >>>> 15.08 75.02 0.00 0.00 0.00 0.00 0.00 >>>> 15.09 75.02 0.00 0.00 0.00 0.00 0.00 >>>> ^CPkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10 >>>> 15.38 68.97 0.00 0.00 0.00 0.00 0.00 >>>> 15.38 68.96 0.00 0.00 0.00 0.00 0.00 >>>> ~~~ >>>> [How to Reproduce] >>>> 1. run turbostat to monitor >>>> 2. leave machine idle >>>> 3. turbostat show cpu only go into pc2~pc3. >>>> >>>> [Misc] >>>> The culprit are this >>>> a7152be79b62) Revert "PCI/ASPM: Save L1 PM Substates Capability for >>>> suspend/resume” >>>> >>>> if revert a7152be79b62, the issue is gone >>> >>> Relevant commits: >>> >>> 4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume") >>> a7152be79b62 ("Revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume"") >>> >>> 4ff116d0d5fd appeared in v6.1-rc1. Prior to 4ff116d0d5fd, ASPM L1 PM >>> Substates configuration was not preserved across suspend/resume, so >>> the system *worked* after resume, but used more power than expected. >>> >>> But 4ff116d0d5fd caused resume to fail completely on some systems, so >>> a7152be79b62 reverted it. With a7152be79b62 reverted, ASPM L1 PM >>> Substates configuration is likely not preserved across suspend/resume. >>> a7152be79b62 appeared in v6.2-rc8 and was backported to the v6.1 >>> stable series starting with v6.1.12. >>> >>> KobaKo, you don't mention any suspend/resume in this bug report, but >>> neither patch should make any difference unless suspend/resume is >>> involved. Does the platform sleep as expected *before* suspend, but >>> fail to sleep after resume? >>> >>> Or maybe some individual device was suspended via runtime power >>> management, and that device lost its L1 PM Substates config? I don't >>> know if there's a way to disable runtime PM easily. >> >> Koba, per your bugzilla update, the issue happens even without >> suspend/resume. And we don't know whether some particular device is >> responsible. >> >> But if we save/restore L1SS state, we can sleep deeper than PC3. If >> we don't preserve L1SS state, we can't. >> >> We definitely want to preserve the L1SS state, but we can't simply >> apply 4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for >> suspend/resume") again because it caused its own regressions [1,2,3] >> >> So somebody needs to figure out what was wrong with 4ff116d0d5fd, fix >> it, verify that it doesn't cause the issues reported by Tasev, Thomas, >> and Mark, and then we can apply it. > > Good days, discussed with Kai-Heng and he mentioned the GPU may not > be pulled off the power. > then the GPU needs L1ss to get into power saving. > > I will investigate further on this way. Did anything come our of this? FWIW, I'm considering to drop this from the list of tracked regressions. Yes, this is a regression, but it's caused by fix for another (worse) regression -- so there is nothing we can do for now anyway (and Koba seems motivated already to look properly into all of this). Or does anyone consider this to be a problem? Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. #regzbot poke