Ping... Andrey On 09/20/2018 04:35 PM, Andrey Grodzovsky wrote: > > What's the status with this error and the suggested patch to fix it ? > It impacts GPU reset on Polaris11. > > Do we want to investigate why the original patch breaks it or just > disable with the proposed patch ? > > > P.S Suspend resume also stopped working on latest branch - will bisect > it later today or tomorrow. > > > Andrey > > > On 09/18/2018 11:00 AM, Christian König wrote: >> Tom, >> >> can you try if the following makes it working again? >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >> index b6160de70d12..d65f5ba92fc5 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >> @@ -937,6 +937,10 @@ static int gfx_v8_0_ring_test_ib(struct >> amdgpu_ring *ring, long timeout) >>        return r; >>  } >> >> +static int gfx_v8_0_kiq_ring_test_ib(struct amdgpu_ring *ring, long >> timeout) >> +{ >> +      return 0; >> +} >> >>  static void gfx_v8_0_free_microcode(struct amdgpu_device *adev) >>  { >> @@ -7174,7 +7178,7 @@ static const struct amdgpu_ring_funcs >> gfx_v8_0_ring_funcs_kiq = { >>        .emit_ib = gfx_v8_0_ring_emit_ib_compute, >>        .emit_fence = gfx_v8_0_ring_emit_fence_kiq, >>        .test_ring = gfx_v8_0_ring_test_ring, >> -      .test_ib = gfx_v8_0_ring_test_ib, >> +      .test_ib = gfx_v8_0_kiq_ring_test_ib, >>        .insert_nop = amdgpu_ring_insert_nop, >>        .pad_ib = amdgpu_ring_generic_pad_ib, >>        .emit_rreg = gfx_v8_0_ring_emit_rreg, >> >> >> Thanks, >> Christian. >> >> Am 18.09.2018 um 16:41 schrieb Christian König: >>> CRTC and GFX interrupts seem to be working perfectly fine. >>> >>> The problem here looks like only EOP interrupts from the Compute >>> queue are not correctly handled. >>> >>> Most likely a bug somewhere in gfx_v8_0_eop_irq(). >>> >>> Christian. >>> >>> Am 18.09.2018 um 16:36 schrieb Deucher, Alexander: >>>> >>>> FWIW, a number of consumer Raven boards have bad IVRS tables >>>> (windows doesn't use interrupt remapping so they are sometimes >>>> wrong and probably not validated. There are a number of workaround >>>> to manually override the IVRS tables to make interrupts work. I >>>> think specifying pci=noacpi is also a possible workaround. >>>> >>>> >>>> Alex >>>> >>>> ------------------------------------------------------------------------ >>>> *From:* amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on behalf >>>> of Christian König <christian.koenig at amd.com> >>>> *Sent:* Tuesday, September 18, 2018 10:31:16 AM >>>> *To:* StDenis, Tom; amd-gfx mailing list; Zhou, David(ChunMing) >>>> *Subject:* Re: Regression on gfx8 with ring init >>>> Well looks like interrupt processing is working perfectly fine. >>>> >>>> But looking at the error message once more I see that this actually >>>> affects ring number 9 and not the GFX ring. >>>> >>>> Can you fix amdgpu_ib_ring_tests() to print ring->name instead of the >>>> number? >>>> >>>> That must be some of the compute rings. >>>> >>>> Thanks, >>>> Christian. >>>> >>>> Am 18.09.2018 um 16:20 schrieb Tom St Denis: >>>> > On 2018-09-18 10:13 a.m., Christian König wrote: >>>> >> Mhm, there is no more failed IB-test in there isn't it? >>>> > >>>> > oh sorry I thought you wanted to test HEAD~ ... Attached is a log >>>> from >>>> > the tip of drm-next >>>> > >>>> > Tom >>>> > >>>> >> >>>> >> Christian. >>>> >> >>>> >> Am 18.09.2018 um 16:09 schrieb Tom St Denis: >>>> >>> Disabling IOMMU in the BIOS resulted in a correct boot up... >>>> >>> >>>> >>> Here's the log. >>>> >>> >>>> >>> Tom >>>> >>> >>>> >>> On 2018-09-18 9:58 a.m., Tom St Denis wrote: >>>> >>>> Odd I couldn't even boot my system with the dGPU as primary after >>>> >>>> rebuilding the kernel. It got hung up in the IOMMU driver (loads >>>> >>>> of AMD-Vi IOMMU errors) which I wasn't able to capture because it >>>> >>>> panic'ed before loading the network stack. >>>> >>>> >>>> >>>> Bizarre. >>>> >>>> >>>> >>>> I'll keep trying. >>>> >>>> >>>> >>>> Tom >>>> >>>> >>>> >>>> On 2018-09-18 9:35 a.m., Christian König wrote: >>>> >>>>> Am 18.09.2018 um 15:32 schrieb Tom St Denis: >>>> >>>>>> On 2018-09-18 9:30 a.m., Christian König wrote: >>>> >>>>>>> Great, not sure if that is a good or a bad news. >>>> >>>>>>> >>>> >>>>>>> Anyway going to revert the change for now. Does anybody >>>> >>>>>>> volunteer to figure out why interrupts sometimes doesn't work >>>> >>>>>>> correctly on Raven? >>>> >>>>>> >>>> >>>>>> What does "doesn't work correctly?" My workstation is a Raven1 >>>> >>>>>> (Ryzen 2400G) and other than the TTM bulk move issue has been >>>> >>>>>> perfectly stable (through suspend/resumes too I might add). >>>> >>>>>> >>>> >>>>>> Anything I could test with my devel raven? >>>> >>>>> >>>> >>>>> The problem seems to be that on some boards IH handling doesn't >>>> >>>>> work as it should. >>>> >>>>> >>>> >>>>> Can you try to disable the onboard graphics and try again? >>>> >>>>> >>>> >>>>> If that still doesn't work there is a DRM_DEBUG in >>>> >>>>> amdgpu_ih_process(), make that a DRM_ERROR and send me the >>>> >>>>> resulting dmesg of loading amdgpu (but don't start any UMD). >>>> >>>>> >>>> >>>>> Thanks, >>>> >>>>> Christian. >>>> >>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> Tom >>>> >>>>>> >>>> >>>>>>> >>>> >>>>>>> Christian. >>>> >>>>>>> >>>> >>>>>>> Am 18.09.2018 um 15:27 schrieb Tom St Denis: >>>> >>>>>>>> This commit: >>>> >>>>>>>> >>>> >>>>>>>> [root at raven linux]# git bisect good >>>> >>>>>>>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first bad >>>> commit >>>> >>>>>>>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 >>>> >>>>>>>> Author: Christian König <christian.koenig at amd.com> >>>> >>>>>>>> Date:  Tue Sep 18 10:38:09 2018 +0200 >>>> >>>>>>>> >>>> >>>>>>>>    drm/amdgpu: remove fence fallback >>>> >>>>>>>> >>>> >>>>>>>>    DC doesn't seem to have a fallback path either. >>>> >>>>>>>> >>>> >>>>>>>>    So when interrupts doesn't work any more we are pretty >>>> much >>>> >>>>>>>> busted no >>>> >>>>>>>>    matter what. >>>> >>>>>>>> >>>> >>>>>>>>    Signed-off-by: Christian König <christian.koenig at amd.com> >>>> >>>>>>>>    Reviewed-by: Chunming Zhou <david1.zhou at amd.com> >>>> >>>>>>>> >>>> >>>>>>>> Results in this: >>>> >>>>>>>> >>>> >>>>>>>> [  24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for >>>> >>>>>>>> 0000:07:00.0 on minor 1 >>>> >>>>>>>> [  24.335674] modprobe (3895) used greatest stack depth: >>>> 12600 >>>> >>>>>>>> bytes left >>>> >>>>>>>> [  26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* >>>> >>>>>>>> amdgpu: IB test timed out. >>>> >>>>>>>> [  26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* >>>> >>>>>>>> amdgpu: failed testing IB on ring 9 (-110). >>>> >>>>>>>> [  26.407885] [drm:process_one_work] *ERROR* ib ring test >>>> >>>>>>>> failed (-110). >>>> >>>>>>>> [  28.506708] fuse init (API version 7.27) >>>> >>>>>>>> >>>> >>>>>>>> On init with my polaris/raven1 system. >>>> >>>>>>>> >>>> >>>>>>>> Cheers, >>>> >>>>>>>> Tom >>>> >>>>>>>> _______________________________________________ >>>> >>>>>>>> amd-gfx mailing list >>>> >>>>>>>> amd-gfx at lists.freedesktop.org >>>> >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>> >>>>>>> >>>> >>>>>> >>>> >>>>> >>>> >>>> >>>> >>> >>>> >> >>>> > >>>> >>>> _______________________________________________ >>>> amd-gfx mailing list >>>> amd-gfx at lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>> >>>> >>>> _______________________________________________ >>>> amd-gfx mailing list >>>> amd-gfx at lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>> >> >> >> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > > > > _______________________________________________ > amd-gfx mailing list > amd-gfx at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180921/cb34b120/attachment-0001.html>