Comment # 25
on bug 105733
from Andrey Grodzovsky
(In reply to Allan from comment #12) > My system started to power down for nothing sometimes, even using the > GTX1070 (nvidia|nouveau) . > Then I installed a Windows image just to be sure if the kernel was the > problem. > > Well, for now it *SEEMS* that isn't *ONLY* the driver/kernel : > - The RX480 was freezing in the same way, then I sent it for warranty. > - RX580 run problematically, almost always I got a message like : "DX11 : > device disconnected" or "Mantle : Device lost". > - GTX1070 was running fine for 1 day, then it became the same as the RX580 > and for my bad luck the system started to power down after a random time > (5min to 2 hours +/-). > > For sure the driver/kernel (amdgpu/linux) has its faults here, and here's > why: > - At Windows, the only card that stuck the system was RX480 sometimes > because it was really broken. > - In other cases, when a failure happened (with Nvidia or AMD), the system > was able to retake the control over the device. > - Maybe doing a soft-reset? > - Maybe just killing the driver and starting again? > - Maybe just by stopping the process that were using the GPU to avoid a big > chain of resulting problems? > - Neither the RX580 nor GTX1070 has dual-bios AFAIK. Maybe RX480, but I did > not test it. > > Then : > - Revised and changed the PCI-Ex power lines : OK. > - Tested power supply (lucky for me AX860i has a self test) : OK. > - Cleaned all slots with a brush : OK. > - Tested again CPU and RAM : OK. > > But , I must be in a very bad luck, the problems persisted. > > I've sent the Motherboard for warranty. I'm waiting for its diagnostic and > solution. > > I'll inform here as soon as it becomes possible. > > Thoughts for the while : > - Not being able to kill the processes *is* a problem that concerns only > amdgpu and it is either a problem of the driver itself (most likely to be) > or of the kernel. We recently fixed the issue of not being able to kill a process stuck like your process in wait for fence signal in kernel mode. Can you build latest kernel (4.18) and grab again latest firmware and try again ? Links to kernel and firmware: https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/ > - The driver is not capable of retaking control of the device. > - It is impossible to kill children pids when something hung using amdgpu. > - Yes, it occurred once or twice using nvidia proprietary too, but it was > probably caused because of the faulty motherboard that I'm waiting to be > fixed. > - Using nouveau was the most happy path , but unfortunately nouveau does not > support Pascal at all yet. It keeps the card at the min clock (300 or > 400MHz) and it is not possible yet to increase the speed of the card. So it > is not a valid working way.
You are receiving this mail because:
- You are the assignee for the bug.
_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel