Hi Christian, Replies follow in between. Regards, LuÃs On Wed, Jan 3, 2018 at 9:37 AM, Christian König <ckoenig.leichtzumerken at gmail.com> wrote: > Hi Luis, > > In general please add information like /proc/iomem and dmesg as attachment > and not mangled inside the mail. Ok, I'll take that into account next time. Sorry for the inconvenience. > > The good news is that your ARM board at least has a memory layout which > should work in theory. So at least one problem rules out. Ok, nice. > > I don't think that apitrace would be much helpful in this case as long as no > developer has access to one of those ARM boards. But it is interesting that > the apitrace reliable reproduces the issue. This means that it isn't > something random, but rather a specific timing of things. I am afraid, I currently don't have boards that I can send yet. I am developing one, but it will still take some time, before I have one ready. I've checked the apitrace and there is a common call glXSwapBuffers(dpy=0x1389f00, drawable=52428803) that I believe will trigger the page flip. I suspect there is a race condition with glXSwapBuffers in mesa or amdgpu, that corrupts some of the data sent to the GPU causing an hang. What I believe it seems to be the case is that the GPU lock up only happens when doing a page flip, since the kernel locks with: [ 243.693200] kworker/u4:3 D 0 89 2 0x00000000 [ 243.693232] Workqueue: events_unbound commit_work [drm_kms_helper] [ 243.693251] [<80b8c6d4>] (__schedule) from [<80b8cdd0>] (schedule+0x4c/0xac) [ 243.693259] [<80b8cdd0>] (schedule) from [<80b91024>] (schedule_timeout+0x228/0x444) [ 243.693270] [<80b91024>] (schedule_timeout) from [<80886738>] (dma_fence_default_wait+0x2b4/0x2d8) [ 243.693276] [<80886738>] (dma_fence_default_wait) from [<80885d60>] (dma_fence_wait_timeout+0x40/0x150) [ 243.693284] [<80885d60>] (dma_fence_wait_timeout) from [<80887b1c>] (reservation_object_wait_timeout_rcu+0xfc/0x34c) [ 243.693509] [<80887b1c>] (reservation_object_wait_timeout_rcu) from [<7f331988>] (amdgpu_dm_do_flip+0xec/0x36c [amdgpu]) [ 243.693789] [<7f331988>] (amdgpu_dm_do_flip [amdgpu]) from [<7f33309c>] (amdgpu_dm_atomic_commit_tail+0xbfc/0xe58 [amdgpu]) ... I will try to reproduce this on x86 with a similar software stack... and the apitrace traces I got. What do you think, does this makes sense? Do you have further suggestions that may help pin down the problem? Another strange thing... the traces that were consistently causing hangs yesterday, today are having a bit more difficulty causing them, but if I play the video with kodi it hangs easily again. Both kodi and glretarce always hangs with similar kernel backtraces, like the one above.