Re: [Bug 217321] New: Intel platforms can't sleep deeper than PC3 during long idle

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 23.05.23 23:49, Bjorn Helgaas wrote:
> On Mon, May 22, 2023 at 01:45:55PM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 05.05.23 08:56, Koba Ko wrote:
>>> On Thu, May 4, 2023 at 5:23 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>>>> [+cc Koba, Ajay, Tasev, Mark, Thomas, regressions list]
>>>> On Tue, Apr 11, 2023 at 03:42:29PM -0500, Bjorn Helgaas wrote:
>>>>> On Tue, Apr 11, 2023 at 08:32:04AM +0000, bugzilla-daemon@xxxxxxxxxx wrote:
>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=217321
>>>>>> ...
>>>>>>         Regression: No
>>>>>>
>>>>>> [Symptom]
>>>>>> Intel cpu can't sleep deeper than pcˇ during long idle
>>>>>> ~~~
>>>>>> Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10
>>>>>> 15.08   75.02   0.00    0.00    0.00    0.00    0.00
>>>>>> 15.09   75.02   0.00    0.00    0.00    0.00    0.00
>>>>>> ^CPkg%pc2       Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10
>>>>>> 15.38   68.97   0.00    0.00    0.00    0.00    0.00
>>>>>> 15.38   68.96   0.00    0.00    0.00    0.00    0.00
>>>>>> ~~~
>>>>>> [How to Reproduce]
>>>>>> 1. run turbostat to monitor
>>>>>> 2. leave machine idle
>>>>>> 3. turbostat show cpu only go into pc2~pc3.
>>>>>>
>>>>>> [Misc]
>>>>>> The culprit are this
>>>>>> a7152be79b62) Revert "PCI/ASPM: Save L1 PM Substates Capability for
>>>>>> suspend/resume”
>>>>>>
>>>>>> if revert a7152be79b62, the issue is gone
>>>>>
>>>>> Relevant commits:
>>>>>
>>>>>   4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume")
>>>>>   a7152be79b62 ("Revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume"")
>>>>>
>>>>> 4ff116d0d5fd appeared in v6.1-rc1.  Prior to 4ff116d0d5fd, ASPM L1 PM
>>>>> Substates configuration was not preserved across suspend/resume, so
>>>>> the system *worked* after resume, but used more power than expected.
>>>>>
>>>>> But 4ff116d0d5fd caused resume to fail completely on some systems, so
>>>>> a7152be79b62 reverted it.  With a7152be79b62 reverted, ASPM L1 PM
>>>>> Substates configuration is likely not preserved across suspend/resume.
>>>>> a7152be79b62 appeared in v6.2-rc8 and was backported to the v6.1
>>>>> stable series starting with v6.1.12.
>>>>>
>>>>> KobaKo, you don't mention any suspend/resume in this bug report, but
>>>>> neither patch should make any difference unless suspend/resume is
>>>>> involved.  Does the platform sleep as expected *before* suspend, but
>>>>> fail to sleep after resume?
>>>>>
>>>>> Or maybe some individual device was suspended via runtime power
>>>>> management, and that device lost its L1 PM Substates config?  I don't
>>>>> know if there's a way to disable runtime PM easily.
>>>>
>>>> Koba, per your bugzilla update, the issue happens even without
>>>> suspend/resume.  And we don't know whether some particular device is
>>>> responsible.
>>>>
>>>> But if we save/restore L1SS state, we can sleep deeper than PC3.  If
>>>> we don't preserve L1SS state, we can't.
>>>>
>>>> We definitely want to preserve the L1SS state, but we can't simply
>>>> apply 4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for
>>>> suspend/resume") again because it caused its own regressions [1,2,3]
>>>>
>>>> So somebody needs to figure out what was wrong with 4ff116d0d5fd, fix
>>>> it, verify that it doesn't cause the issues reported by Tasev, Thomas,
>>>> and Mark, and then we can apply it.
>>>
>>> Good days, discussed with Kai-Heng and he mentioned  the GPU may not
>>> be pulled off the power.
>>> then the GPU needs L1ss to get into power saving.
>>>
>>> I will investigate further on this way.
>>
>> Did anything come our of this?
>>
>> FWIW, I'm considering to drop this from the list of tracked regressions.
>> Yes, this is a regression, but it's caused by fix for another (worse)
>> regression -- so there is nothing we can do for now anyway (and Koba
>> seems motivated already to look properly into all of this). Or does
>> anyone consider this to be a problem?
> 
> I would drop this from the regression list.
> 
> Yes, bz 217321 is a bug, and yes, 4ff116d0d5fd is a partial fix for
> it, but 4ff116d0d5fd causes worse problems (it breaks resume from
> suspend) than just living with bz 217321, which is a "mere" power
> consumption issue.

Thx for confirming and putting it in better words.

#regzbot inconclusive: can't be solved for now, as this is a regression
causes by a fix for a regression (see list/bz for details)

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux