Re: Regression in 6.6: trying to set DPMS mode kills radeon (r600)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023-12-19 19:46, Alex Deucher wrote:
On Mon, Dec 18, 2023 at 1:52 PM Holger Hoffstätte
<holger@xxxxxxxxxxxxxxxxxxxxxx> wrote:

On 2023-12-16 18:36, Holger Hoffstätte wrote:

<snip>
The affected machine is an older SandyBridge dektop with a fanless
r600 Redwood GPU, using the radeon driver. "Recently" - some time
after the last few 6.6.x stable updates - it started to die with GPU
lockups. I first blamed this on standby/resume - because why not? - but
this turned out to be wrong; the real culprit is DPMS.

I use xfce-power-manager as "screensaver" to turn off the display after
inacitvity. This can be configured in two ways: "suspend" and "poweroff".
I've been using "poweroff" since forever without problems, until now.

The symptom is that everything works fine until the screensaver kicks in
and tries to turn the monitor off, which sends the radeon driver and the GPU
into a complete tailspin.

<snip>

Eventually the screensaver tries to switch off the monitor via DPMS "poweroff" method and
this greatly upsets the GPU:

Dec 12 20:39:59 ragnarok kernel: radeon 0000:01:00.0: ring 0 stalled for more than 10140msec
Dec 12 20:39:59 ragnarok kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000002 last fence id 0x0000000000000003 on ring 0)

In the meantime I have confirmed that all this is still more complicated:
even using the "suspend" method only works after boot, not after a system suspend
cycle. Yes, weird but reproducible.

I have tried to chase down the problematic release, and as suspected this
started to happen with 6.6.5; 6.6.4 is fine.

Based on this information I found the offending commits and reverted them
in order from 6.6.7, which fixes everything for me:

b0399e22ada0 "drm/amd/display: Remove power sequencing check"
45f98fccb1f6 "drm/amd/display: Refactor edp power control"

Those patches are for amdgpu.  From the logs in your original post,
you are using the radeon driver.  They two are completely separate
drivers.  I don't see how those patches could be related.  That code
would never even execute.

Hi,

I understand the difference between amdgpu and radeon, that's why I was
wondering why those patches would make a difference.

The crash/no-crash behaviour was definitely reproducible - same config
and clean rebuild every time etc. My only guess was that maybe one of the
touched headers got included in the drm-display-helper used by radeon as
well, but that is seemingly not the case either.

In any case, it seems that whatever was going on is fixed in stable-6.6.8-rc1;
at least I haven't been able to reproduce the lockup so far, with various
combinations of display suspend/resume. There's at least one EDID-related patch
in 6.6.8 but I don' understand enough about the various display technologies to
assess whether that could have played a role.

You can probably imagine how frustrating it is to have a GPU that deadlocks while
_not_ doing anything. At least it seems to be working again now, either way.

Thanks for reading!

cheers
Holger



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux