Hi,
On 05-01-17 16:06, Lukas Wunner wrote:
On Wed, Jan 04, 2017 at 06:21:14PM -0500, David Airlie wrote:
On Wednesday, January 04, 2017 10:09:54 PM Peter Wu wrote:
On Wed, Jan 04, 2017 at 09:16:39AM +0100, Lukas Wunner wrote:
On Tue, Jan 03, 2017 at 06:05:57PM -0600, Bjorn Helgaas wrote:
I don't *want* to apply the revert. It's on my for-linus branch as a
worst-case scenario change if we can't figure out a better fix.
The patch below is preferable, but I'd rather not take even it,
because it takes away functionality and forces people to use a boot
parameter to restore it. I expect that somebody will figure out how
to fix the regression Kilian found and also keep the new functionality
(without requiring boot parameters) before v4.10.
The issue is constrained to hybrid graphics laptops with Nvidia discrete
GPU using nouveau. Hence it needs to be fixed in nouveau, not in the
PCI core.
The problem is not necessarily in the nouveau driver, the same problem
occurs when you enable RPM without loading nouveau. The issue is limited
though to some newer hybrid graphics laptops with Nvidia GPUs. While a
quirk can be added to nouveau, I think that a (temporary) quirk in core
would also be reasonable (since it also occurs without nouveau).
(AFAIUI, laptops with AMD discrete GPU are not affected as it is known
when and how to call an ACPI method versus using PR3.)
(Neither are laptops using the Nvidia proprietary driver as it doesn't
runtime suspend the card. But battery life will be terrible then.)
We're at rc2 so the time frame for coming up with a fix is probably
4 weeks. Peter and others have tried for months to reverse-engineer
how to handle runtime PM on newer Nvidia cards. It seems likely that
we'll not find the ultimate solution to the problem within 4 weeks.
Yep, a quick proper fix seems unlikely.
[ Help/ideas are welcome, I suspect that these failures to restore power
on laptops designed for Win8+ all have the same cause, related to some
unknown interaction between ACPI and PCI. Some links:
https://bugzilla.kernel.org/show_bug.cgi?id=190861
https://bugzilla.kernel.org/show_bug.cgi?id=156341 ]
The way it is now, i.e. defaulting to PR3 when available, regresses
certain laptops such as Kilian's. If on the other hand we default to
DSM when available, we'll regress certain other laptops, as Peter has
pointed out. Whitelisting or blacklisting laptops doesn't seem a good
approach either, ideally we'd want to use PR3 as Windows does.
As said, the only short-term solution I see is to add an "optimus"
module_param to nouveau to allow users to select which method to use.
So in Kilian's case an additional command line parameter would be
necessary to fix the issue.
Does anyone see a better solution or can we agree on this one? If so
I can come up with a patch. This could go in via Dave Airlie's tree.
As pcie_port_pm=off already reverts to DSM, I do not think that an
additional (temporary) nouveau module parameter is going to help. I
instead propose a (hopefully temporary) quirk in pci core that disables
D3cold RPM for just Kilians Lenovo laptop (basically defaulting to
pcie_port_pm=off). Then the option pcie_port_pm=force can still be used
to test possible solutions in the future.
I would rather add a quirk to the ACPI core to prevent the power resources in
question from being enumerated. Or even to prevent ACPI PM from being
used for the port in question.
I do have a W541 in a cupboard in the office somewhere, but I won't be close to
it for a couple of weeks. The W541 was the first place I tested the pm patches
so I'm kinda wondering whether it's all W541's or just some specific model/bios
combo.
However I'm pretty much unavailable to do anything much until late Jan on this.
Is there anyone else at Red Hat who might be able to look into this?
ISTR that Hans de Goede is working on improving laptop support in Fedora,
and Peter Jones recently got a patch merged for the W541 with the exact
same firmware Kilian is using to work around a botched EFI memory map.
Adding them to cc: in the hope that they may be able to help.
@Peter, have you noticed issues with the discrete Nvidia GPU on your W541
related to runtime suspend and system sleep?
I've a W541 sitting in my home office at well. I will take it through
some gpu runtime suspend/resume testing. Which kernel introduces the
problem I'm looking for ?
I believe mine has the old BIOS / EFI which is less troublesome so I
will first see if I can reproduce the problem with that and then upgrade
to see if that introduces the problem.
Peter IIRC you said that after upgrading the firmware I need a new enough
kernel to be able to even boot, from which kernel onwards will the machine
boot with the new firmware ?
Also is it possible to downgrade the EFI again ? ...
Regards,
Hans
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html