[Bug 206475] amdgpu under load drop signal to monitor until hard reset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=206475

--- Comment #17 from Andrew Ammerlaan (andrewammerlaan@xxxxxxxxxx) ---
(In reply to Alex Deucher from comment #16)
> When the GPU is in reset all reads to the MMIO BAR return 1s so you are just
> getting all ones until the reset succeeds.  511 is just all ones.  This
> patch will fix that issue:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?id=9271dfd9e0f79e2969dcbe28568bce0fdc4f8f73

Well there goes my hypotheses of the broken thermal sensor xD.

I did discover yesterday that the fan of my GPU spins relatively slow under
high load. When the GPU reached ~80 degrees Celsius, the fan didn't even spin
at half the maximum RPM! I used the pwmconfig script and the fancontrol service
from lm_sensors to force the fan to go to the maximum RPM just before reaching
80 degrees Celsius. It's very noisy, *but* the GPU stays well below 70 degrees
Celsius now, even under heavy load. As this issue seems to occur only when the
GPU is hotter then ~75 degrees Celsius, I'm hoping that this will help in
preventing the problem.

I'm still confused as to why this is at all necessary, the critical temperature
is 91, so why do I encounter these issues at ~80?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux