Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2

ckoenig.leichtzumerken@xxxxxxxxx (Christian König) · Wed, 3 Jan 2018 13:34:02 +0100

In this case please open a bug report on fdo and describe exactly how to 
reproduce it.

Marek should be able to take a look then.

Thanks,
Christian.

Am 03.01.2018 um 12:56 schrieb LuÃs Mendes:
> Hi Christian, David,
>
> David, replying to your question... The issue is indeed reproducible
> on x86, I just did it with kodi and the same VP9 video. So it is not
> arm specific.
>
> Regards,
> LuÃs
>
> On Wed, Jan 3, 2018 at 11:02 AM, LuÃs Mendes <luis.p.mendes at gmail.com> wrote:
>> Hi Christian,
>>
>> Replies follow in between.
>>
>> Regards,
>> LuÃs
>>
>> On Wed, Jan 3, 2018 at 9:37 AM, Christian KÃ¶nig
>> <ckoenig.leichtzumerken at gmail.com> wrote:
>>> Hi Luis,
>>>
>>> In general please add information like /proc/iomem and dmesg as attachment
>>> and not mangled inside the mail.
>> Ok, I'll take that into account next time. Sorry for the inconvenience.
>>
>>> The good news is that your ARM board at least has a memory layout which
>>> should work in theory. So at least one problem rules out.
>> Ok, nice.
>>
>>> I don't think that apitrace would be much helpful in this case as long as no
>>> developer has access to one of those ARM boards. But it is interesting that
>>> the apitrace reliable reproduces the issue. This means that it isn't
>>> something random, but rather a specific timing of things.
>> I am afraid, I currently don't have boards that I can send yet. I am
>> developing one, but it will still take some time, before I have one
>> ready.
>>
>> I've checked the apitrace and there is a common call
>> glXSwapBuffers(dpy=0x1389f00, drawable=52428803) that I believe will
>> trigger the page flip. I suspect there is a race condition with
>> glXSwapBuffers in mesa or amdgpu, that corrupts some of the data sent
>> to the GPU causing an hang.
>> What I believe it seems to be the case is that the GPU lock up only
>> happens when doing a page flip, since the kernel locks with:
>> [  243.693200] kworker/u4:3    D    0    89      2 0x00000000
>> [  243.693232] Workqueue: events_unbound commit_work [drm_kms_helper]
>> [  243.693251] [<80b8c6d4>] (__schedule) from [<80b8cdd0>] (schedule+0x4c/0xac)
>> [  243.693259] [<80b8cdd0>] (schedule) from [<80b91024>]
>> (schedule_timeout+0x228/0x444)
>> [  243.693270] [<80b91024>] (schedule_timeout) from [<80886738>]
>> (dma_fence_default_wait+0x2b4/0x2d8)
>> [  243.693276] [<80886738>] (dma_fence_default_wait) from [<80885d60>]
>> (dma_fence_wait_timeout+0x40/0x150)
>> [  243.693284] [<80885d60>] (dma_fence_wait_timeout) from [<80887b1c>]
>> (reservation_object_wait_timeout_rcu+0xfc/0x34c)
>> [  243.693509] [<80887b1c>] (reservation_object_wait_timeout_rcu) from
>> [<7f331988>] (amdgpu_dm_do_flip+0xec/0x36c [amdgpu])
>> [  243.693789] [<7f331988>] (amdgpu_dm_do_flip [amdgpu]) from
>> [<7f33309c>] (amdgpu_dm_atomic_commit_tail+0xbfc/0xe58 [amdgpu])
>> ...
>>
>> I will try to reproduce this on x86 with a similar software stack...
>> and the apitrace traces I got.
>> What do you think, does this makes sense? Do you have further
>> suggestions that may help pin down the problem?
>>
>> Another strange thing... the traces that were consistently causing
>> hangs yesterday, today are having a bit more difficulty causing them,
>> but if I play the video with kodi it hangs easily again. Both kodi and
>> glretarce always hangs with similar kernel backtraces, like the one
>> above.
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx