[Bug 206475] amdgpu under load drop signal to monitor until hard reset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=206475

Marco (rodomar705@xxxxxxxxxxxxxx) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|---                         |ANSWERED

--- Comment #20 from Marco (rodomar705@xxxxxxxxxxxxxx) ---
I finally got where the problem was, and completely fixed it. It was hardware.
The issue was the heatsink was not contacting completely a section on the
mosfets that was feeding power to the core of the card. Under full load they
was thermal tripping for overheating and completely stalling the card to avoid
damages to themselves. The problem was that this card wasn't reporting the
temps of them to software, even if the actual vrm controller was (or if it was
shutting down only when the mosfet trigger purely a signal asserting the
thermal runaway condition). This was hell to debug and fix, as always with
hardware problems, but after a stress test on both Windows and Linux under full
clock, the issue is not present anymore.

I'll keep my optimized clocks for lower temperatures and less fan noise, but
for me the issue wasn't software.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux