Hi everyone, I've tested the kernel from amd-drm-next-4.17-wip at commit 9ab2894122275a6d636bb2654a157e88a0f7b9e2 ( drm/amdgpu: set DRIVER_ATOMIC flag early) on ARMv7l, and the reported issues seem now to have gone. I haven't checked from which commit this is fixed, but it is now fixed! I also noticed a performance improvement in one of the glmark2 tests. There seem to be some other small issues, possibly unrelated, such that sometimes the screen becomes black and the sound stops while playing the video for a second or less and then normal playback is recovered, this happens rarely and at most once per power cycle, while using X and Kodi, despite I have played many individual videos and power cycled the machine sometimes. I've also observed what was already reported, when watching non-VP9 videos: [ 591.729558] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu: writing more dwords to the ring than expected! [ 591.740255] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu: writing more dwords to the ring than expected! [ 591.750968] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu: writing more dwords to the ring than expected! [ 591.761628] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu: writing more dwords to the ring than expected! [ 591.772248] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: writing more dwords to the ring than expected! [ 591.782672] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: writing more dwords to the ring than expected! [ 591.793172] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: writing more dwords to the ring than expected! [ 591.803681] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: writing more dwords to the ring than expected! [ 591.814129] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: writing more dwords to the ring than expected! [ 591.824560] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: writing more dwords to the ring than expected! [ 591.835054] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: writing more dwords to the ring than expected! [ 591.845437] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: writing more dwords to the ring than expected! [ 591.855860] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: writing more dwords to the ring than expected! [ 591.866415] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: writing more dwords to the ring than expected! [ 591.876945] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: writing more dwords to the ring than expected! [ 591.887454] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: writing more dwords to the ring than expected! Regards, LuÃs Mendes On Wed, Jan 3, 2018 at 11:08 PM, LuÃs Mendes <luis.p.mendes at gmail.com> wrote: > Hi Michel, Christian, > > Michel, I have tested amd-staging-drm-next at commit "drm/amdgpu/gfx9: > only init the apertures used by KGD (v2)" - > 0e4946409d11913523d30bc4830d10b388438c7a and the issues remain, both > on ARMv7 and on x86 amd64. > > Christian, in fact if I replay the apitraces obtained on the ARMv7 > platform on the AMD64 I am also able to reproduce the GPU hang! So it > is not ARM platform specific. Should I send/upload the apitraces? I > have two of them, typically when one doesn't hang the gpu the other > hangs. One takes about 1GB of disk space while the other takes 2.3GB. > ... > [ 69.019381] ISO 9660 Extensions: RRIP_1991A > [ 213.292094] DMAR: DRHD: handling fault status reg 2 > [ 213.292102] DMAR: [INTR-REMAP] Request device [00:00.0] fault index > 1c [fault reason 38] Blocked an interrupt request due to source-id > verification failure > [ 223.406919] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx > timeout, last signaled seq=25158, last emitted seq=25160 > [ 223.406926] [drm] IP block:tonga_ih is hung! > [ 223.407167] [drm] GPU recovery disabled. > > Regards, > LuÃs > > > On Wed, Jan 3, 2018 at 5:47 PM, LuÃs Mendes <luis.p.mendes at gmail.com> wrote: >> Hi Michel, Christian, >> >> Christian, I have followed your suggestion and I have just submitted a >> bug to fdo at https://bugs.freedesktop.org/show_bug.cgi?id=104481 - >> GPU lockup Polaris 11 - AMD RX 460 and RX 550 on amd64 and on ARMv7 >> platforms while playing video. >> >> Michel, amdgpu.dc=0 seems to make no difference. I will try >> amd-staging-drm-next and report back. >> >> Regards, >> LuÃs >> >> On Wed, Jan 3, 2018 at 5:09 PM, Michel Dänzer <michel at daenzer.net> wrote: >>> On 2018-01-03 12:02 PM, LuÃs Mendes wrote: >>>> >>>> What I believe it seems to be the case is that the GPU lock up only >>>> happens when doing a page flip, since the kernel locks with: >>>> [ 243.693200] kworker/u4:3 D 0 89 2 0x00000000 >>>> [ 243.693232] Workqueue: events_unbound commit_work [drm_kms_helper] >>>> [ 243.693251] [<80b8c6d4>] (__schedule) from [<80b8cdd0>] (schedule+0x4c/0xac) >>>> [ 243.693259] [<80b8cdd0>] (schedule) from [<80b91024>] >>>> (schedule_timeout+0x228/0x444) >>>> [ 243.693270] [<80b91024>] (schedule_timeout) from [<80886738>] >>>> (dma_fence_default_wait+0x2b4/0x2d8) >>>> [ 243.693276] [<80886738>] (dma_fence_default_wait) from [<80885d60>] >>>> (dma_fence_wait_timeout+0x40/0x150) >>>> [ 243.693284] [<80885d60>] (dma_fence_wait_timeout) from [<80887b1c>] >>>> (reservation_object_wait_timeout_rcu+0xfc/0x34c) >>>> [ 243.693509] [<80887b1c>] (reservation_object_wait_timeout_rcu) from >>>> [<7f331988>] (amdgpu_dm_do_flip+0xec/0x36c [amdgpu]) >>>> [ 243.693789] [<7f331988>] (amdgpu_dm_do_flip [amdgpu]) from >>>> [<7f33309c>] (amdgpu_dm_atomic_commit_tail+0xbfc/0xe58 [amdgpu]) >>>> ... >>> >>> Does the problem also occur if you disable DC with amdgpu.dc=0 on the >>> kernel command line? >>> >>> Does it also happen with a kernel built from the amd-staging-drm-next >>> branch instead of drm-next-4.16? >>> >>> >>> -- >>> Earthling Michel Dänzer | http://www.amd.com >>> Libre software enthusiast | Mesa and X developer