Re: Expecting to revert commit 55285e21f045 "fbdev/efifb: Release PCI device ..."

Christian König <christian.koenig@xxxxxxx> · Tue, 21 Dec 2021 08:51:58 +0100

Good morning guys,

first of all get better soon Linus.

I'm unfortunately not the best expert for runtime power management 
(Alex) nor display (Harry), but from the lack of their response I guess 
that they are already on vacation. So maybe take everything I explain 
here with a grain of salt.

Then for the background we have two separate power management features 
here which doesn't seem to work as they should.

The first buggy one is runtime power management, which is what commit 
55285e21f045 surfaces. My educated guess is that the now corrected 
reference counting turns of the GPU before userspace has a chance to 
send a signal to the monitor to turn of it's backlight. Double checking 
the code I can see the correct calls to pm_runtime_get_*() and 
pm_runtime_put_*() in amdgpu_dm_atomic_commit_tail(), but to be honest 
that function seems to be quite a mess.

A trace of what exactly happens during PM autosuspend might help here. 
Daniel do you know any tracepoint for that?

Then we have DPMS, which is basically the way of telling the monitor to 
shut of it's backlight. When this doesn't work as expected (e.g. you 
need *two* cycles) then it can as well be that userspace is not sending 
the right command.

When you use X you could double check with "xset dpms force off" and 
"xset dpms force suspend". At least with my monitor it turns of the 
backlight in both cases, but maybe your hardware behaves differently.

Regards,
Christian.

Am 20.12.21 um 23:21 schrieb Linus Torvalds:
[ Adding back in more amd people and the amd list, the people Daniel
added seem to have gotten lost again, but I think people at least saw
my original report thanks to Daniel ]

With "amdgpu.runpm=0", things are better, but not perfect. With that I
can lock the screen, and it has to go through *two* cycles of "No
signal, turning off", but on the second cycle it does finally work.

This was exposed by commit 55285e21f045 ("fbdev/efifb: Release PCI
device's runtime PM ref during FB destroy"), probably because that
made runtime PM actually potentially work, but it is then broken on
amdgpu.

Absolutely nothing odd in my setup. Two monitors, one GPU. PCI ID
1002:67df rev e7, subsystem ID 1da2:e353.

I'd expect pretty much any amdgpu person to see this.

On Mon, Dec 20, 2021 at 2:04 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
Note: on my machine, I get that

    amdgpu 0000:49:00.0: amdgpu: Using BACO for runtime pm

so maybe the other possible runtime pm models (ARPX and BOCO) are ok,
and it's only that BACO case that is broken.
Hmm. The *documentation* says:

     PX runtime pm
         2 = force enable with BAMACO,
         1 = force enable with BACO,
         0 = disable,
         -1 = PX only default

but the code actually makes anything != 0 enable it, except on VEGA20
and ARCTURUS, where it needs to be positive.

My card is apparently "POLARIS10", whatever that means, which means
that any non-zero value of amdgpu_runtime_pm will enable runtime PM as
long as "amdgpu_device_supports_baco()" is true. Which it is.

Whatever. Now I'm just kwetching about the documentation not matching
what I see the code doing, which is never a great sign when things
don't work.

               Linus