Re: PCI: Revert "PCI: Add runtime PM support for PCIe ports"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear all,

sounds interesting I could try to update to 2.29.

Shall I do so?

Best regards

Kilian



On 11-Jan-17 12:04, Hans de Goede wrote:
> HI,
>
> On 05-01-17 16:06, Lukas Wunner wrote:
>> On Wed, Jan 04, 2017 at 06:21:14PM -0500, David Airlie wrote:
>>>> On Wednesday, January 04, 2017 10:09:54 PM Peter Wu wrote:
>>>>> On Wed, Jan 04, 2017 at 09:16:39AM +0100, Lukas Wunner wrote:
>>>>>> On Tue, Jan 03, 2017 at 06:05:57PM -0600, Bjorn Helgaas wrote:
>>>>>>> I don't *want* to apply the revert.  It's on my for-linus branch
>>>>>>> as a
>>>>>>> worst-case scenario change if we can't figure out a better fix.
>>>>>>>
>>>>>>> The patch below is preferable, but I'd rather not take even it,
>>>>>>> because it takes away functionality and forces people to use a boot
>>>>>>> parameter to restore it.  I expect that somebody will figure out
>>>>>>> how
>>>>>>> to fix the regression Kilian found and also keep the new
>>>>>>> functionality
>>>>>>> (without requiring boot parameters) before v4.10.
>>>>>>
>>>>>> The issue is constrained to hybrid graphics laptops with Nvidia
>>>>>> discrete
>>>>>> GPU using nouveau.  Hence it needs to be fixed in nouveau, not in
>>>>>> the
>>>>>> PCI core.
>>>>>
>>>>> The problem is not necessarily in the nouveau driver, the same
>>>>> problem
>>>>> occurs when you enable RPM without loading nouveau. The issue is
>>>>> limited
>>>>> though to some newer hybrid graphics laptops with Nvidia GPUs.
>>>>> While a
>>>>> quirk can be added to nouveau, I think that a (temporary) quirk in
>>>>> core
>>>>> would also be reasonable (since it also occurs without nouveau).
>>>>>
>>>>>> (AFAIUI, laptops with AMD discrete GPU are not affected as it is
>>>>>> known
>>>>>> when and how to call an ACPI method versus using PR3.)
>>>>>>
>>>>>> (Neither are laptops using the Nvidia proprietary driver as it
>>>>>> doesn't
>>>>>> runtime suspend the card.  But battery life will be terrible then.)
>>>>>>
>>>>>> We're at rc2 so the time frame for coming up with a fix is probably
>>>>>> 4 weeks.  Peter and others have tried for months to reverse-engineer
>>>>>> how to handle runtime PM on newer Nvidia cards.  It seems likely
>>>>>> that
>>>>>> we'll not find the ultimate solution to the problem within 4 weeks.
>>>>>
>>>>> Yep, a quick proper fix seems unlikely.
>>>>> [ Help/ideas are welcome, I suspect that these failures to restore
>>>>> power
>>>>> on laptops designed for Win8+ all have the same cause, related to
>>>>> some
>>>>> unknown interaction between ACPI and PCI. Some links:
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=190861
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=156341 ]
>>>>>
>>>>>> The way it is now, i.e. defaulting to PR3 when available, regresses
>>>>>> certain laptops such as Kilian's.  If on the other hand we
>>>>>> default to
>>>>>> DSM when available, we'll regress certain other laptops, as Peter
>>>>>> has
>>>>>> pointed out.  Whitelisting or blacklisting laptops doesn't seem a
>>>>>> good
>>>>>> approach either, ideally we'd want to use PR3 as Windows does.
>>>>>>
>>>>>> As said, the only short-term solution I see is to add an "optimus"
>>>>>> module_param to nouveau to allow users to select which method to
>>>>>> use.
>>>>>> So in Kilian's case an additional command line parameter would be
>>>>>> necessary to fix the issue.
>>>>>>
>>>>>> Does anyone see a better solution or can we agree on this one? 
>>>>>> If so
>>>>>> I can come up with a patch.  This could go in via Dave Airlie's
>>>>>> tree.
>>>>>
>>>>> As pcie_port_pm=off already reverts to DSM, I do not think that an
>>>>> additional (temporary) nouveau module parameter is going to help. I
>>>>> instead propose a (hopefully temporary) quirk in pci core that
>>>>> disables
>>>>> D3cold RPM for just Kilians Lenovo laptop (basically defaulting to
>>>>> pcie_port_pm=off). Then the option pcie_port_pm=force can still be
>>>>> used
>>>>> to test possible solutions in the future.
>>>>
>>>> I would rather add a quirk to the ACPI core to prevent the power
>>>> resources in
>>>> question from being enumerated.  Or even to prevent ACPI PM from being
>>>> used for the port in question.
>>>
>>> I do have a W541 in a cupboard in the office somewhere, but I won't
>>> be close to
>>> it for a couple of weeks. The W541 was the first place I tested the
>>> pm patches
>>> so I'm kinda wondering whether it's all W541's or just some specific
>>> model/bios
>>> combo.
>>>
>>> However I'm pretty much unavailable to do anything much until late
>>> Jan on this.
>>
>> Is there anyone else at Red Hat who might be able to look into this?
>>
>> ISTR that Hans de Goede is working on improving laptop support in
>> Fedora,
>> and Peter Jones recently got a patch merged for the W541 with the exact
>> same firmware Kilian is using to work around a botched EFI memory map.
>> Adding them to cc: in the hope that they may be able to help.
>>
>> @Peter, have you noticed issues with the discrete Nvidia GPU on your
>> W541
>> related to runtime suspend and system sleep?
>
> I've tried to reproduce this problem on my W541, which has the exact
> same CPU + GPU combo as the reporter of:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=190861
>
> But no luck, I started out with BIOS-2.27 and when I could not reproduce
> I updated to 2.29 (should have tried 2.28 which is what the reporter
> has first in retrospect) and still no luck in reproducing this.
>
> I'll attach acpidumps of the 2 Bios versions I've tried to the bug.
>
> Regards,
>
> Hans
>

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux