GPU hang trying to run OpenCL kernels on x86_64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Issue remains in kernel 4.17.5, tested with SAPPHIRE RX 550 4GB running
same OpenCL kernel.

[  227.443025] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0
timeout, last signaled seq=22, last emitted seq=25
[  227.443112] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1
timeout, last signaled seq=22, last emitted seq=24
[  227.443117] [drm] IP block:gmc_v8_0 is hung!
[  227.443120] [drm] IP block:tonga_ih is hung!
[  227.443123] [drm] IP block:gfx_v8_0 is hung!
[  227.443124] [drm] IP block:gmc_v8_0 is hung!
[  227.443126] [drm] IP block:tonga_ih is hung!
[  227.443127] [drm] IP block:sdma_v3_0 is hung!
[  227.443128] [drm] IP block:uvd_v6_0 is hung!
[  227.443130] [drm] IP block:gfx_v8_0 is hung!
[  227.443132] [drm] IP block:sdma_v3_0 is hung!
[  227.443133] [drm] IP block:vce_v3_0 is hung!
[  227.443134] [drm] GPU recovery disabled.
[  227.443137] [drm] IP block:uvd_v6_0 is hung!
[  227.443140] [drm] IP block:vce_v3_0 is hung!
[  227.443141] [drm] GPU recovery disabled.

Regards,
Luís

On Tue, Jun 26, 2018 at 10:03 AM, Luís Mendes <luis.p.mendes at gmail.com>
wrote:

>
> I've tested Ubuntu 18.04 with kernel 4.17.2 using libdrm-2.4.92 and
> mesa-18.1.0 and AMD RX 550 4GB is still hanging when running the identified
> OpenCL kernels.
>
> [  548.704916] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0
> timeout, last signaled seq=30, last emitted seq=33
> [  548.704988] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1
> timeout, last signaled seq=29, last emitted seq=31
> [  548.704992] [drm] IP block:gmc_v8_0 is hung!
> [  548.704994] [drm] IP block:tonga_ih is hung!
> [  548.704996] [drm] IP block:gmc_v8_0 is hung!
> [  548.704997] [drm] IP block:gfx_v8_0 is hung!
> [  548.704998] [drm] IP block:sdma_v3_0 is hung!
> [  548.704999] [drm] IP block:tonga_ih is hung!
> [  548.705000] [drm] IP block:uvd_v6_0 is hung!
> [  548.705001] [drm] IP block:gfx_v8_0 is hung!
> [  548.705002] [drm] IP block:sdma_v3_0 is hung!
> [  548.705003] [drm] IP block:uvd_v6_0 is hung!
> [  548.705004] [drm] IP block:vce_v3_0 is hung!
> [  548.705005] [drm] GPU recovery disabled.
> [  548.705006] [drm] IP block:vce_v3_0 is hung!
> [  548.705007] [drm] GPU recovery disabled.
>
> Are there any new regarding this issue?
>
> Regards,
> Luís
>
> On Fri, May 25, 2018 at 11:23 AM, Luís Mendes <luis.p.mendes at gmail.com>
> wrote:
>
>> I've just tested Ubuntu 18.04 with kernel 4.17-rc6 using libdrm-2.4.92
>> and mesa-18.1.0.
>> Now both sdma0 and sdma1 timeout as can be seen in the attached logs.
>>
>> ~agd5f -b drm-next-4.18 doesn't improve also.
>>
>> I have also tried amdgpu-pro 18.20 both on Ubuntu 18.04 and 16.04, but no
>> improvements.
>> I have tried amdgpu-pro 18.10 and 17.50 and also no improvements.
>>
>> ./amdgpu-pro-install -opencl=legacy,pal --headless
>>
>> On Thu, May 24, 2018 at 11:18 AM, Luís Mendes <luis.p.mendes at gmail.com>
>> wrote:
>>
>>> Additional update...
>>>
>>> I was able to boot and enter X by installing an NVIDIA GTX 1050 Ti as
>>> the primary display card and using an AMD RX 550 as the secondary card on
>>> the Tyan S7025 with the same Ubuntu 18.04 and the same Linux kernel
>>> 4.17-rc6.
>>> However once I try to run an OpenCL kernel on RX 550 I get a sdma1
>>> timeout and the GPU hangs, which likely what is happening when I boot with
>>> RX 550 as the single GPU card on the system.
>>>
>>> This means it is not an issue introduced in 4.17-rc6, it just means that
>>> I didn't notice the effect of the system with the two GPUs vs system with
>>> single AMD GPU.
>>>
>>> The dmesg log follows attached.
>>>
>>> Luís
>>>
>>> On Thu, May 24, 2018 at 10:13 AM, Luís Mendes <luis.p.mendes at gmail.com>
>>> wrote:
>>>
>>>> Hi Michel,
>>>>
>>>> I also work as a researcher at a university and we are considering
>>>> buying AMD cards to do OpenCL computations for numerical modelling, but
>>>> currently I am unable to give a try at the AMD cards I have at home.
>>>> I couldn't find any working driver for them... also amdgpu-pro drivers
>>>> don't work, or at least I have been unable to make them work.
>>>>
>>>> Regards,
>>>> Luís
>>>>
>>>> On Thu, May 24, 2018 at 10:01 AM, Luís Mendes <luis.p.mendes at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Michel,
>>>>>
>>>>> So summarizing with Linux kernel 4.17-rc6 on Ubuntu 18.04 using AMD RX
>>>>> 460/RX 550 I am not able to enter X.
>>>>> The same system with AMD Radeon R7 240 not only enters X as also runs
>>>>> the OpenCL kernel that RX 460 / RX 550 are unable to run for all the
>>>>> kernels that I have tested.
>>>>> Could this also be a Mesa issue, regarding OpenCL on RX 460?
>>>>>
>>>>> Regards,
>>>>> Luís
>>>>>
>>>>> On Thu, May 24, 2018 at 9:55 AM, Luís Mendes <luis.p.mendes at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Michel,
>>>>>>
>>>>>> I will have to check previous rc releases of 4.17 to see if it wasn't
>>>>>> already happening, before trying any possible git bisect.
>>>>>> As an update I can say that an AMD Radeon R7 240 works fine on the
>>>>>> same system with the same kernel and I am able to run the OpenCL kernels,
>>>>>> that I couldn't with RX 460/RX 550.
>>>>>>
>>>>>> Regards,
>>>>>> Luís
>>>>>>
>>>>>> On Thu, May 24, 2018 at 9:30 AM, Michel Dänzer <michel at daenzer.net>
>>>>>> wrote:
>>>>>>
>>>>>>> On 2018-05-24 12:06 AM, Luís Mendes wrote:
>>>>>>> > I've tried Linux 4.17-rc6 with Ubuntu 18.04 on Tyan S7002 and I am
>>>>>>> not even
>>>>>>> > able see lightdm/gdm3 as system hangs when starting X.
>>>>>>> > Having SR-IOV enabled or disabled makes no difference.
>>>>>>> > Tested with AMD RX 460.
>>>>>>> > When X is supposed to start the system hangs and only a
>>>>>>> rectangular region
>>>>>>> > on the top left corner screen remains with console text messages
>>>>>>> from the
>>>>>>> > boot process while the remaining of the screen is just black. I am
>>>>>>> unable
>>>>>>> > to do anything with the keyboard, switching to console does not
>>>>>>> work,
>>>>>>> > ctrl-alt-del also doesn't work. I've to do a cold reset.
>>>>>>>
>>>>>>> Can you isolate which change introduced this new issue with git
>>>>>>> bisect?
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Earthling Michel Dänzer               |
>>>>>>> http://www.amd.com
>>>>>>> Libre software enthusiast             |             Mesa and X
>>>>>>> developer
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180710/97759567/attachment.html>


[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux