Hi Alexander, I've cherry picked the patch you pointed out into kernel from amd-drm-next-4.17-wip at commit 9ab2894122275a6d636bb2654a157e88a0f7b9e2 ( drm/amdgpu: set DRIVER_ATOMIC flag early) and tested it on ARMv7l and the problem has gone indeed. Working great on ARMv7l with AMD RX460. Thanks, LuÃs Mendes On Tue, Jan 30, 2018 at 6:44 PM, Deucher, Alexander <Alexander.Deucher at amd.com> wrote: > Fixed with this patch: > > https://lists.freedesktop.org/archives/amd-gfx/2018-January/018472.html > > > Alex > > ________________________________ > From: LuÃs Mendes <luis.p.mendes at gmail.com> > Sent: Tuesday, January 30, 2018 1:30 PM > To: Michel Dänzer; Koenig, Christian > Cc: Deucher, Alexander; Zhou, David(ChunMing); amd-gfx at lists.freedesktop.org > Subject: Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - > Update 2 > > Hi everyone, > > I've tested the kernel from amd-drm-next-4.17-wip at commit > 9ab2894122275a6d636bb2654a157e88a0f7b9e2 ( > drm/amdgpu: set DRIVER_ATOMIC flag early) on ARMv7l, and the reported > issues seem now to have gone. I haven't checked from which commit this > is fixed, but it is now fixed! I also noticed a performance > improvement in one of the glmark2 tests. > > There seem to be some other small issues, possibly unrelated, such > that sometimes the screen becomes black and the sound stops while > playing the video for a second or less and then normal playback is > recovered, this happens rarely and at most once per power cycle, while > using X and Kodi, despite I have played many individual videos and > power cycled the machine sometimes. > > I've also observed what was already reported, when watching non-VP9 videos: > [ 591.729558] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu: > writing more dwords to the ring than expected! > [ 591.740255] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu: > writing more dwords to the ring than expected! > [ 591.750968] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu: > writing more dwords to the ring than expected! > [ 591.761628] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu: > writing more dwords to the ring than expected! > [ 591.772248] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: > writing more dwords to the ring than expected! > [ 591.782672] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: > writing more dwords to the ring than expected! > [ 591.793172] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: > writing more dwords to the ring than expected! > [ 591.803681] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: > writing more dwords to the ring than expected! > [ 591.814129] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: > writing more dwords to the ring than expected! > [ 591.824560] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: > writing more dwords to the ring than expected! > [ 591.835054] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: > writing more dwords to the ring than expected! > [ 591.845437] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: > writing more dwords to the ring than expected! > [ 591.855860] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: > writing more dwords to the ring than expected! > [ 591.866415] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: > writing more dwords to the ring than expected! > [ 591.876945] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: > writing more dwords to the ring than expected! > [ 591.887454] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu: > writing more dwords to the ring than expected! > > Regards, > LuÃs Mendes > > On Wed, Jan 3, 2018 at 11:08 PM, LuÃs Mendes <luis.p.mendes at gmail.com> > wrote: >> Hi Michel, Christian, >> >> Michel, I have tested amd-staging-drm-next at commit "drm/amdgpu/gfx9: >> only init the apertures used by KGD (v2)" - >> 0e4946409d11913523d30bc4830d10b388438c7a and the issues remain, both >> on ARMv7 and on x86 amd64. >> >> Christian, in fact if I replay the apitraces obtained on the ARMv7 >> platform on the AMD64 I am also able to reproduce the GPU hang! So it >> is not ARM platform specific. Should I send/upload the apitraces? I >> have two of them, typically when one doesn't hang the gpu the other >> hangs. One takes about 1GB of disk space while the other takes 2.3GB. >> ... >> [ 69.019381] ISO 9660 Extensions: RRIP_1991A >> [ 213.292094] DMAR: DRHD: handling fault status reg 2 >> [ 213.292102] DMAR: [INTR-REMAP] Request device [00:00.0] fault index >> 1c [fault reason 38] Blocked an interrupt request due to source-id >> verification failure >> [ 223.406919] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx >> timeout, last signaled seq=25158, last emitted seq=25160 >> [ 223.406926] [drm] IP block:tonga_ih is hung! >> [ 223.407167] [drm] GPU recovery disabled. >> >> Regards, >> LuÃs >> >> >> On Wed, Jan 3, 2018 at 5:47 PM, LuÃs Mendes <luis.p.mendes at gmail.com> >> wrote: >>> Hi Michel, Christian, >>> >>> Christian, I have followed your suggestion and I have just submitted a >>> bug to fdo at https://bugs.freedesktop.org/show_bug.cgi?id=104481 - >>> GPU lockup Polaris 11 - AMD RX 460 and RX 550 on amd64 and on ARMv7 >>> platforms while playing video. >>> >>> Michel, amdgpu.dc=0 seems to make no difference. I will try >>> amd-staging-drm-next and report back. >>> >>> Regards, >>> LuÃs >>> >>> On Wed, Jan 3, 2018 at 5:09 PM, Michel Dänzer <michel at daenzer.net> wrote: >>>> On 2018-01-03 12:02 PM, LuÃs Mendes wrote: >>>>> >>>>> What I believe it seems to be the case is that the GPU lock up only >>>>> happens when doing a page flip, since the kernel locks with: >>>>> [ 243.693200] kworker/u4:3 D 0 89 2 0x00000000 >>>>> [ 243.693232] Workqueue: events_unbound commit_work [drm_kms_helper] >>>>> [ 243.693251] [<80b8c6d4>] (__schedule) from [<80b8cdd0>] >>>>> (schedule+0x4c/0xac) >>>>> [ 243.693259] [<80b8cdd0>] (schedule) from [<80b91024>] >>>>> (schedule_timeout+0x228/0x444) >>>>> [ 243.693270] [<80b91024>] (schedule_timeout) from [<80886738>] >>>>> (dma_fence_default_wait+0x2b4/0x2d8) >>>>> [ 243.693276] [<80886738>] (dma_fence_default_wait) from [<80885d60>] >>>>> (dma_fence_wait_timeout+0x40/0x150) >>>>> [ 243.693284] [<80885d60>] (dma_fence_wait_timeout) from [<80887b1c>] >>>>> (reservation_object_wait_timeout_rcu+0xfc/0x34c) >>>>> [ 243.693509] [<80887b1c>] (reservation_object_wait_timeout_rcu) from >>>>> [<7f331988>] (amdgpu_dm_do_flip+0xec/0x36c [amdgpu]) >>>>> [ 243.693789] [<7f331988>] (amdgpu_dm_do_flip [amdgpu]) from >>>>> [<7f33309c>] (amdgpu_dm_atomic_commit_tail+0xbfc/0xe58 [amdgpu]) >>>>> ... >>>> >>>> Does the problem also occur if you disable DC with amdgpu.dc=0 on the >>>> kernel command line? >>>> >>>> Does it also happen with a kernel built from the amd-staging-drm-next >>>> branch instead of drm-next-4.16? >>>> >>>> >>>> -- >>>> Earthling Michel Dänzer | http://www.amd.com >>>> Libre software enthusiast | Mesa and X developer