On 2018-09-18 10:31 a.m., Christian König wrote: > Well looks like interrupt processing is working perfectly fine. > > But looking at the error message once more I see that this actually > affects ring number 9 and not the GFX ring. > > Can you fix amdgpu_ib_ring_tests() to print ring->name instead of the > number? > > That must be some of the compute rings. That's a bingo. [ 32.231734] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:01:00.0 on minor 0 [ 32.233803] modprobe (3816) used greatest stack depth: 12464 bytes left [ 35.266007] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB test timed out. [ 35.266373] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring (kiq_2.1.0) 9 (-110). [ 35.403034] [drm:process_one_work] *ERROR* ib ring test failed (-110). Should point out that kfd still has the old fence logic: [root at raven amd]# git grep enable_signaling amdgpu/amdgpu_amdkfd_fence.c: * nofity when the BO is free to move. fence_add_callback --> enable_signaling amdgpu/amdgpu_amdkfd_fence.c: * --> amdgpu_amdkfd_fence.enable_signaling amdgpu/amdgpu_amdkfd_fence.c: * amdgpu_amdkfd_fence.enable_signaling - Start a work item that will quiesce amdgpu/amdgpu_amdkfd_fence.c: * amdkfd_fence_enable_signaling - This gets called when TTM wants to evict amdgpu/amdgpu_amdkfd_fence.c:static bool amdkfd_fence_enable_signaling(struct dma_fence *f) amdgpu/amdgpu_amdkfd_fence.c: .enable_signaling = amdkfd_fence_enable_signaling, Tom > > Thanks, > Christian. > > Am 18.09.2018 um 16:20 schrieb Tom St Denis: >> On 2018-09-18 10:13 a.m., Christian König wrote: >>> Mhm, there is no more failed IB-test in there isn't it? >> >> oh sorry I thought you wanted to test HEAD~ ... Attached is a log from >> the tip of drm-next >> >> Tom >> >>> >>> Christian. >>> >>> Am 18.09.2018 um 16:09 schrieb Tom St Denis: >>>> Disabling IOMMU in the BIOS resulted in a correct boot up... >>>> >>>> Here's the log. >>>> >>>> Tom >>>> >>>> On 2018-09-18 9:58 a.m., Tom St Denis wrote: >>>>> Odd I couldn't even boot my system with the dGPU as primary after >>>>> rebuilding the kernel. It got hung up in the IOMMU driver (loads >>>>> of AMD-Vi IOMMU errors) which I wasn't able to capture because it >>>>> panic'ed before loading the network stack. >>>>> >>>>> Bizarre. >>>>> >>>>> I'll keep trying. >>>>> >>>>> Tom >>>>> >>>>> On 2018-09-18 9:35 a.m., Christian König wrote: >>>>>> Am 18.09.2018 um 15:32 schrieb Tom St Denis: >>>>>>> On 2018-09-18 9:30 a.m., Christian König wrote: >>>>>>>> Great, not sure if that is a good or a bad news. >>>>>>>> >>>>>>>> Anyway going to revert the change for now. Does anybody >>>>>>>> volunteer to figure out why interrupts sometimes doesn't work >>>>>>>> correctly on Raven? >>>>>>> >>>>>>> What does "doesn't work correctly?" My workstation is a Raven1 >>>>>>> (Ryzen 2400G) and other than the TTM bulk move issue has been >>>>>>> perfectly stable (through suspend/resumes too I might add). >>>>>>> >>>>>>> Anything I could test with my devel raven? >>>>>> >>>>>> The problem seems to be that on some boards IH handling doesn't >>>>>> work as it should. >>>>>> >>>>>> Can you try to disable the onboard graphics and try again? >>>>>> >>>>>> If that still doesn't work there is a DRM_DEBUG in >>>>>> amdgpu_ih_process(), make that a DRM_ERROR and send me the >>>>>> resulting dmesg of loading amdgpu (but don't start any UMD). >>>>>> >>>>>> Thanks, >>>>>> Christian. >>>>>> >>>>>>> >>>>>>> >>>>>>> Tom >>>>>>> >>>>>>>> >>>>>>>> Christian. >>>>>>>> >>>>>>>> Am 18.09.2018 um 15:27 schrieb Tom St Denis: >>>>>>>>> This commit: >>>>>>>>> >>>>>>>>> [root at raven linux]# git bisect good >>>>>>>>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first bad commit >>>>>>>>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 >>>>>>>>> Author: Christian König <christian.koenig at amd.com> >>>>>>>>> Date:  Tue Sep 18 10:38:09 2018 +0200 >>>>>>>>> >>>>>>>>>    drm/amdgpu: remove fence fallback >>>>>>>>> >>>>>>>>>    DC doesn't seem to have a fallback path either. >>>>>>>>> >>>>>>>>>    So when interrupts doesn't work any more we are pretty much >>>>>>>>> busted no >>>>>>>>>    matter what. >>>>>>>>> >>>>>>>>>    Signed-off-by: Christian König <christian.koenig at amd.com> >>>>>>>>>    Reviewed-by: Chunming Zhou <david1.zhou at amd.com> >>>>>>>>> >>>>>>>>> Results in this: >>>>>>>>> >>>>>>>>> [  24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for >>>>>>>>> 0000:07:00.0 on minor 1 >>>>>>>>> [  24.335674] modprobe (3895) used greatest stack depth: 12600 >>>>>>>>> bytes left >>>>>>>>> [  26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* >>>>>>>>> amdgpu: IB test timed out. >>>>>>>>> [  26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* >>>>>>>>> amdgpu: failed testing IB on ring 9 (-110). >>>>>>>>> [  26.407885] [drm:process_one_work] *ERROR* ib ring test >>>>>>>>> failed (-110). >>>>>>>>> [  28.506708] fuse init (API version 7.27) >>>>>>>>> >>>>>>>>> On init with my polaris/raven1 system. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Tom >>>>>>>>> _______________________________________________ >>>>>>>>> amd-gfx mailing list >>>>>>>>> amd-gfx at lists.freedesktop.org >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >