Re: PCI: Revert "PCI: Add runtime PM support for PCIe ports"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Lyude,

On 09-01-17 16:11, Lyude Paul wrote:
fwiw, I just tried on the W541 I have 4.8.15-300.fc25.x86_64 running on
here and so far it seems to suspend/resume just fine using firmware
version 2.19

Note this is not about normal suspend resume, but runtime
suspend/resume of the nvidia discrete GPU...

Try running glxgears like this:

DRI_PRIME=1 glxgears -info | grep REND

(the grep is to check you're really running on the nvidia GPU).

Then you should see msgs in dmesg about nouveau resuming the gpu,
then kill glxgears and wait for 5 seconds, now the nouveau drv
should say the gpu is suspending, etc.

If it never runtime suspends, then make sure you are not using
any external screens, only the built-in laptop screen.

Regards,

Hans



On Thu, 2017-01-05 at 14:36 -0500, David Airlie wrote:
(cc'ing Lyude, who has the hw also I think).

----- Original Message -----
From: "Peter Jones" <pjones@xxxxxxxxxx>
To: "Lukas Wunner" <lukas@xxxxxxxxx>
Cc: "David Airlie" <airlied@xxxxxxxxxx>, "Rafael J. Wysocki" <rjw@r
jwysocki.net>, "Peter Wu" <peter@xxxxxxxxxxxxx>,
"Bjorn Helgaas" <helgaas@xxxxxxxxxx>, "Mika Westerberg" <mika.weste
rberg@xxxxxxxxxxxxxxx>, "Kilian Singer"
<kilian.singer@xxxxxxxxxxxxxxxxxxxxxx>, "linux-pci" <linux-pci@vger
.kernel.org>, "Alex Deucher"
<alexander.deucher@xxxxxxx>, "Hans de Goede" <hdegoede@xxxxxxxxxx>
Sent: Friday, 6 January, 2017 4:13:23 AM
Subject: Re: PCI: Revert "PCI: Add runtime PM support for PCIe
ports"

On Thu, Jan 05, 2017 at 04:06:46PM +0100, Lukas Wunner wrote:
On Wed, Jan 04, 2017 at 06:21:14PM -0500, David Airlie wrote:
On Wednesday, January 04, 2017 10:09:54 PM Peter Wu wrote:
On Wed, Jan 04, 2017 at 09:16:39AM +0100, Lukas Wunner
wrote:
On Tue, Jan 03, 2017 at 06:05:57PM -0600, Bjorn Helgaas
wrote:
I don't *want* to apply the revert.  It's on my for-
linus branch
as a
worst-case scenario change if we can't figure out a
better fix.

The patch below is preferable, but I'd rather not take
even it,
because it takes away functionality and forces people
to use a
boot
parameter to restore it.  I expect that somebody will
figure out
how
to fix the regression Kilian found and also keep the
new
functionality
(without requiring boot parameters) before v4.10.

The issue is constrained to hybrid graphics laptops with
Nvidia
discrete
GPU using nouveau.  Hence it needs to be fixed in
nouveau, not in
the
PCI core.

The problem is not necessarily in the nouveau driver, the
same
problem
occurs when you enable RPM without loading nouveau. The
issue is
limited
though to some newer hybrid graphics laptops with Nvidia
GPUs. While
a
quirk can be added to nouveau, I think that a (temporary)
quirk in
core
would also be reasonable (since it also occurs without
nouveau).

(AFAIUI, laptops with AMD discrete GPU are not affected
as it is
known
when and how to call an ACPI method versus using PR3.)

(Neither are laptops using the Nvidia proprietary driver
as it
doesn't
runtime suspend the card.  But battery life will be
terrible then.)

We're at rc2 so the time frame for coming up with a fix
is probably
4 weeks.  Peter and others have tried for months to
reverse-engineer
how to handle runtime PM on newer Nvidia cards.  It seems
likely
that
we'll not find the ultimate solution to the problem
within 4 weeks.

Yep, a quick proper fix seems unlikely.
[ Help/ideas are welcome, I suspect that these failures to
restore
power
on laptops designed for Win8+ all have the same cause,
related to
some
unknown interaction between ACPI and PCI. Some links:
https://bugzilla.kernel.org/show_bug.cgi?id=190861
https://bugzilla.kernel.org/show_bug.cgi?id=156341 ]

The way it is now, i.e. defaulting to PR3 when available,
regresses
certain laptops such as Kilian's.  If on the other hand
we default
to
DSM when available, we'll regress certain other laptops,
as Peter
has
pointed out.  Whitelisting or blacklisting laptops
doesn't seem a
good
approach either, ideally we'd want to use PR3 as Windows
does.

As said, the only short-term solution I see is to add an
"optimus"
module_param to nouveau to allow users to select which
method to
use.
So in Kilian's case an additional command line parameter
would be
necessary to fix the issue.

Does anyone see a better solution or can we agree on this
one?  If
so
I can come up with a patch.  This could go in via Dave
Airlie's
tree.

As pcie_port_pm=off already reverts to DSM, I do not think
that an
additional (temporary) nouveau module parameter is going to
help. I
instead propose a (hopefully temporary) quirk in pci core
that
disables
D3cold RPM for just Kilians Lenovo laptop (basically
defaulting to
pcie_port_pm=off). Then the option pcie_port_pm=force can
still be
used
to test possible solutions in the future.

I would rather add a quirk to the ACPI core to prevent the
power
resources in
question from being enumerated.  Or even to prevent ACPI PM
from being
used for the port in question.

I do have a W541 in a cupboard in the office somewhere, but I
won't be
close to
it for a couple of weeks. The W541 was the first place I tested
the pm
patches
so I'm kinda wondering whether it's all W541's or just some
specific
model/bios
combo.

They seem to all ship with the 1.10 firmware, and 2.80 is current
(there
are a bunch of intermediate 2.xx versions).  Somewhere along the
line
they introduced some bugs in the UEFI stuff, so it wouldn't be
surprising if there's bugs introduced elsewhere as well.

However I'm pretty much unavailable to do anything much until
late Jan on
this.

Is there anyone else at Red Hat who might be able to look into
this?

ISTR that Hans de Goede is working on improving laptop support in
Fedora,
and Peter Jones recently got a patch merged for the W541 with the
exact
same firmware Kilian is using to work around a botched EFI memory
map.
Adding them to cc: in the hope that they may be able to help.

@Peter, have you noticed issues with the discrete Nvidia GPU on
your W541
related to runtime suspend and system sleep?

I was using a borrowed one (I can certainly find it again, but I'm
not
working on graphics/pm really), but yeah - shutdown and lspci both
broke
sometime after pci_pm_runtime_resume().  Here's the traceback from
SYS_reboot(): https://goo.gl/photos/T1fr1bksHQb9RSU67

Dave, if you know who in Westford should have a look at this, I can
see
about getting them hardware.  I am more or less surrounded by that
team.

--
        Peter

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux