No worries, I will just revert locally until then to clear the extra errors during my investigation of current GPU reset status and issues. Andrey On 09/21/2018 01:53 PM, Christian König wrote: > I unfortunately don't have a Polaris to test this myself. > > But please give me time till Monday so that I can at least try one > more things to fix it. > > Christian. > > Am 21.09.2018 um 19:11 schrieb Andrey Grodzovsky: >> >> Ping... >> >> >> Andrey >> >> >> On 09/20/2018 04:35 PM, Andrey Grodzovsky wrote: >>> >>> What's the status with this error and the suggested patch to fix it >>> ? It impacts GPU reset on Polaris11. >>> >>> Do we want to investigate why the original patch breaks it or just >>> disable with the proposed patch ? >>> >>> >>> P.S Suspend resume also stopped working on latest branch - will >>> bisect it later today or tomorrow. >>> >>> >>> Andrey >>> >>> >>> On 09/18/2018 11:00 AM, Christian König wrote: >>>> Tom, >>>> >>>> can you try if the following makes it working again? >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >>>> index b6160de70d12..d65f5ba92fc5 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >>>> @@ -937,6 +937,10 @@ static int gfx_v8_0_ring_test_ib(struct >>>> amdgpu_ring *ring, long timeout) >>>>        return r; >>>>  } >>>> >>>> +static int gfx_v8_0_kiq_ring_test_ib(struct amdgpu_ring *ring, >>>> long timeout) >>>> +{ >>>> +      return 0; >>>> +} >>>> >>>>  static void gfx_v8_0_free_microcode(struct amdgpu_device *adev) >>>>  { >>>> @@ -7174,7 +7178,7 @@ static const struct amdgpu_ring_funcs >>>> gfx_v8_0_ring_funcs_kiq = { >>>>        .emit_ib = gfx_v8_0_ring_emit_ib_compute, >>>>        .emit_fence = gfx_v8_0_ring_emit_fence_kiq, >>>>        .test_ring = gfx_v8_0_ring_test_ring, >>>> -      .test_ib = gfx_v8_0_ring_test_ib, >>>> +      .test_ib = gfx_v8_0_kiq_ring_test_ib, >>>>        .insert_nop = amdgpu_ring_insert_nop, >>>>        .pad_ib = amdgpu_ring_generic_pad_ib, >>>>        .emit_rreg = gfx_v8_0_ring_emit_rreg, >>>> >>>> >>>> Thanks, >>>> Christian. >>>> >>>> Am 18.09.2018 um 16:41 schrieb Christian König: >>>>> CRTC and GFX interrupts seem to be working perfectly fine. >>>>> >>>>> The problem here looks like only EOP interrupts from the Compute >>>>> queue are not correctly handled. >>>>> >>>>> Most likely a bug somewhere in gfx_v8_0_eop_irq(). >>>>> >>>>> Christian. >>>>> >>>>> Am 18.09.2018 um 16:36 schrieb Deucher, Alexander: >>>>>> >>>>>> FWIW, a number of consumer Raven boards have bad IVRS tables >>>>>> (windows doesn't use interrupt remapping so they are sometimes >>>>>> wrong and probably not validated. There are a number of >>>>>> workaround to manually override the IVRS tables to make >>>>>> interrupts work. I think specifying pci=noacpi is also a >>>>>> possible workaround. >>>>>> >>>>>> >>>>>> Alex >>>>>> >>>>>> ------------------------------------------------------------------------ >>>>>> *From:* amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on behalf >>>>>> of Christian König <christian.koenig at amd.com> >>>>>> *Sent:* Tuesday, September 18, 2018 10:31:16 AM >>>>>> *To:* StDenis, Tom; amd-gfx mailing list; Zhou, David(ChunMing) >>>>>> *Subject:* Re: Regression on gfx8 with ring init >>>>>> Well looks like interrupt processing is working perfectly fine. >>>>>> >>>>>> But looking at the error message once more I see that this actually >>>>>> affects ring number 9 and not the GFX ring. >>>>>> >>>>>> Can you fix amdgpu_ib_ring_tests() to print ring->name instead of >>>>>> the >>>>>> number? >>>>>> >>>>>> That must be some of the compute rings. >>>>>> >>>>>> Thanks, >>>>>> Christian. >>>>>> >>>>>> Am 18.09.2018 um 16:20 schrieb Tom St Denis: >>>>>> > On 2018-09-18 10:13 a.m., Christian König wrote: >>>>>> >> Mhm, there is no more failed IB-test in there isn't it? >>>>>> > >>>>>> > oh sorry I thought you wanted to test HEAD~ ... Attached is a >>>>>> log from >>>>>> > the tip of drm-next >>>>>> > >>>>>> > Tom >>>>>> > >>>>>> >> >>>>>> >> Christian. >>>>>> >> >>>>>> >> Am 18.09.2018 um 16:09 schrieb Tom St Denis: >>>>>> >>> Disabling IOMMU in the BIOS resulted in a correct boot up... >>>>>> >>> >>>>>> >>> Here's the log. >>>>>> >>> >>>>>> >>> Tom >>>>>> >>> >>>>>> >>> On 2018-09-18 9:58 a.m., Tom St Denis wrote: >>>>>> >>>> Odd I couldn't even boot my system with the dGPU as primary >>>>>> after >>>>>> >>>> rebuilding the kernel. It got hung up in the IOMMU driver >>>>>> (loads >>>>>> >>>> of AMD-Vi IOMMU errors) which I wasn't able to capture >>>>>> because it >>>>>> >>>> panic'ed before loading the network stack. >>>>>> >>>> >>>>>> >>>> Bizarre. >>>>>> >>>> >>>>>> >>>> I'll keep trying. >>>>>> >>>> >>>>>> >>>> Tom >>>>>> >>>> >>>>>> >>>> On 2018-09-18 9:35 a.m., Christian König wrote: >>>>>> >>>>> Am 18.09.2018 um 15:32 schrieb Tom St Denis: >>>>>> >>>>>> On 2018-09-18 9:30 a.m., Christian König wrote: >>>>>> >>>>>>> Great, not sure if that is a good or a bad news. >>>>>> >>>>>>> >>>>>> >>>>>>> Anyway going to revert the change for now. Does anybody >>>>>> >>>>>>> volunteer to figure out why interrupts sometimes doesn't >>>>>> work >>>>>> >>>>>>> correctly on Raven? >>>>>> >>>>>> >>>>>> >>>>>> What does "doesn't work correctly?" My workstation is a >>>>>> Raven1 >>>>>> >>>>>> (Ryzen 2400G) and other than the TTM bulk move issue has been >>>>>> >>>>>> perfectly stable (through suspend/resumes too I might add). >>>>>> >>>>>> >>>>>> >>>>>> Anything I could test with my devel raven? >>>>>> >>>>> >>>>>> >>>>> The problem seems to be that on some boards IH handling >>>>>> doesn't >>>>>> >>>>> work as it should. >>>>>> >>>>> >>>>>> >>>>> Can you try to disable the onboard graphics and try again? >>>>>> >>>>> >>>>>> >>>>> If that still doesn't work there is a DRM_DEBUG in >>>>>> >>>>> amdgpu_ih_process(), make that a DRM_ERROR and send me the >>>>>> >>>>> resulting dmesg of loading amdgpu (but don't start any UMD). >>>>>> >>>>> >>>>>> >>>>> Thanks, >>>>>> >>>>> Christian. >>>>>> >>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Tom >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> Christian. >>>>>> >>>>>>> >>>>>> >>>>>>> Am 18.09.2018 um 15:27 schrieb Tom St Denis: >>>>>> >>>>>>>> This commit: >>>>>> >>>>>>>> >>>>>> >>>>>>>> [root at raven linux]# git bisect good >>>>>> >>>>>>>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first >>>>>> bad commit >>>>>> >>>>>>>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 >>>>>> >>>>>>>> Author: Christian König <christian.koenig at amd.com> >>>>>> >>>>>>>> Date:  Tue Sep 18 10:38:09 2018 +0200 >>>>>> >>>>>>>> >>>>>> >>>>>>>>    drm/amdgpu: remove fence fallback >>>>>> >>>>>>>> >>>>>> >>>>>>>>    DC doesn't seem to have a fallback path either. >>>>>> >>>>>>>> >>>>>> >>>>>>>>    So when interrupts doesn't work any more we are >>>>>> pretty much >>>>>> >>>>>>>> busted no >>>>>> >>>>>>>>    matter what. >>>>>> >>>>>>>> >>>>>> >>>>>>>> Signed-off-by: Christian König <christian.koenig at amd.com> >>>>>> >>>>>>>> Reviewed-by: Chunming Zhou <david1.zhou at amd.com> >>>>>> >>>>>>>> >>>>>> >>>>>>>> Results in this: >>>>>> >>>>>>>> >>>>>> >>>>>>>> [  24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for >>>>>> >>>>>>>> 0000:07:00.0 on minor 1 >>>>>> >>>>>>>> [  24.335674] modprobe (3895) used greatest stack >>>>>> depth: 12600 >>>>>> >>>>>>>> bytes left >>>>>> >>>>>>>> [  26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* >>>>>> >>>>>>>> amdgpu: IB test timed out. >>>>>> >>>>>>>> [  26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* >>>>>> >>>>>>>> amdgpu: failed testing IB on ring 9 (-110). >>>>>> >>>>>>>> [  26.407885] [drm:process_one_work] *ERROR* ib ring test >>>>>> >>>>>>>> failed (-110). >>>>>> >>>>>>>> [  28.506708] fuse init (API version 7.27) >>>>>> >>>>>>>> >>>>>> >>>>>>>> On init with my polaris/raven1 system. >>>>>> >>>>>>>> >>>>>> >>>>>>>> Cheers, >>>>>> >>>>>>>> Tom >>>>>> >>>>>>>> _______________________________________________ >>>>>> >>>>>>>> amd-gfx mailing list >>>>>> >>>>>>>> amd-gfx at lists.freedesktop.org >>>>>> >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>>> >>>> >>>>>> >>> >>>>>> >> >>>>>> > >>>>>> >>>>>> _______________________________________________ >>>>>> amd-gfx mailing list >>>>>> amd-gfx at lists.freedesktop.org >>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> amd-gfx mailing list >>>>>> amd-gfx at lists.freedesktop.org >>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> amd-gfx mailing list >>>> amd-gfx at lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>> >>> >>> >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx at lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> >> >> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180921/5c5e244f/attachment-0001.html>