[Bug 206475] amdgpu under load drop signal to monitor until hard reset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=206475

--- Comment #14 from Andrew Ammerlaan (andrewammerlaan@xxxxxxxxxx) ---
I sort of worked around this too.

I changed two things:

1) the iGPU is now the primary GPU, and I use DRI_PRIME=1 to offload to the AMD
gpu. This has reduced the amount of things that are rendered on the AMD card.
This didn't actually fix anything, but it did remove the necessity for a hard
reboot when the AMD GPU does a reset. Now, when the GPU resets only the
applications that are rendered on the AMD card stop working, the desktop and
stuff stay functional. 

2) I added three fans to my PC. Though the card's thermal sensor never reported
that it reached the critical temperature (it went up to 82 Celsius max,
critical is 91 Celsius). There definitely does seem to be a correlation between
high temperatures and the occurrence of the resets. And more fans is always
better anyway.

I still experienced some resets after switching the primary GPU to the iGPU,
but only if I really pushed it to it's limits. I haven't had a single reset
since I added the fans. (Though admittedly I haven't run a decent stress test
yet, so it is still too early to conclude that the problem is completely gone)

Since under-clocking the card worked for you, and adding fans seems to work for
me. I have a hunch that even though the thermal sensor doesn't report
problematic temperatures some parts of the card actually do reach problematic
temperatures nonetheless, which might causes issues leading to a reset.
I'm not sure where the sensor is physically located, but considering that the
card is quite large, it doesn't seem that far fetched to me that there could be
quite a large difference in temperature between two points on the card.

Perhaps this card could benefit from a second thermal sensor or earlier and/or
more aggressive thermal throttling.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux