Hi Christian, David, David, replying to your question... The issue is indeed reproducible on x86, I just did it with kodi and the same VP9 video. So it is not arm specific. Regards, LuÃs On Wed, Jan 3, 2018 at 11:02 AM, LuÃs Mendes <luis.p.mendes at gmail.com> wrote: > Hi Christian, > > Replies follow in between. > > Regards, > LuÃs > > On Wed, Jan 3, 2018 at 9:37 AM, Christian König > <ckoenig.leichtzumerken at gmail.com> wrote: >> Hi Luis, >> >> In general please add information like /proc/iomem and dmesg as attachment >> and not mangled inside the mail. > > Ok, I'll take that into account next time. Sorry for the inconvenience. > >> >> The good news is that your ARM board at least has a memory layout which >> should work in theory. So at least one problem rules out. > > Ok, nice. > >> >> I don't think that apitrace would be much helpful in this case as long as no >> developer has access to one of those ARM boards. But it is interesting that >> the apitrace reliable reproduces the issue. This means that it isn't >> something random, but rather a specific timing of things. > > I am afraid, I currently don't have boards that I can send yet. I am > developing one, but it will still take some time, before I have one > ready. > > I've checked the apitrace and there is a common call > glXSwapBuffers(dpy=0x1389f00, drawable=52428803) that I believe will > trigger the page flip. I suspect there is a race condition with > glXSwapBuffers in mesa or amdgpu, that corrupts some of the data sent > to the GPU causing an hang. > What I believe it seems to be the case is that the GPU lock up only > happens when doing a page flip, since the kernel locks with: > [ 243.693200] kworker/u4:3 D 0 89 2 0x00000000 > [ 243.693232] Workqueue: events_unbound commit_work [drm_kms_helper] > [ 243.693251] [<80b8c6d4>] (__schedule) from [<80b8cdd0>] (schedule+0x4c/0xac) > [ 243.693259] [<80b8cdd0>] (schedule) from [<80b91024>] > (schedule_timeout+0x228/0x444) > [ 243.693270] [<80b91024>] (schedule_timeout) from [<80886738>] > (dma_fence_default_wait+0x2b4/0x2d8) > [ 243.693276] [<80886738>] (dma_fence_default_wait) from [<80885d60>] > (dma_fence_wait_timeout+0x40/0x150) > [ 243.693284] [<80885d60>] (dma_fence_wait_timeout) from [<80887b1c>] > (reservation_object_wait_timeout_rcu+0xfc/0x34c) > [ 243.693509] [<80887b1c>] (reservation_object_wait_timeout_rcu) from > [<7f331988>] (amdgpu_dm_do_flip+0xec/0x36c [amdgpu]) > [ 243.693789] [<7f331988>] (amdgpu_dm_do_flip [amdgpu]) from > [<7f33309c>] (amdgpu_dm_atomic_commit_tail+0xbfc/0xe58 [amdgpu]) > ... > > I will try to reproduce this on x86 with a similar software stack... > and the apitrace traces I got. > What do you think, does this makes sense? Do you have further > suggestions that may help pin down the problem? > > Another strange thing... the traces that were consistently causing > hangs yesterday, today are having a bit more difficulty causing them, > but if I play the video with kodi it hangs easily again. Both kodi and > glretarce always hangs with similar kernel backtraces, like the one > above.