Another related thought: I think the reason some chips had failing VM fault tests with noretry=0 was due to a dependency on IH rerouting of retry faults. This dependency has been fixed by Christian recently: commit 849c62248ee84c1e304a9ce2f673c79e23f29bf9 Author: Christian K?nig <christian.koenig@xxxxxxx> Date: Sat Oct 31 18:39:54 2020 +0100 drm/amdgpu: enabled software IH ring for Vega Seems like we won't get the hardware IH1/2 rings on Vega20 working. Signed-off-by: Christian K?nig <christian.koenig@xxxxxxx> Reviewed-by: Felix Kuehling <Felix.Kuehling@xxxxxxx> drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 7 +++++++ 1 file changed, 7 insertions(+) commit 198237744d85c4a23914de56d78fba0acf5a2803 Author: Christian K?nig <christian.koenig@xxxxxxx> Date: Tue Nov 3 14:22:50 2020 +0100 drm/amdgpu: enabled software IH ring for Navi Felix pointed out that we need this for Navi as well. Signed-off-by: Christian K?nig <christian.koenig@xxxxxxx> Reviewed-by: Felix Kuehling <Felix.Kuehling@xxxxxxx> drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 7 +++++++ 1 file changed, 7 insertions(+) So it should now be safe to enable retry faults on most chips. Only on GFXv9 there can be a performance advantage to disabling retry. Regards, Felix Am 2020-11-30 um 12:35 p.m. schrieb Felix Kuehling: > Like I stated elsewhere, I would recommend noretry=0 for Navi and later > GPUs because there is no performance advantage from disabling retry on > those GPUs. > > > Regards, > Felix > > > Am 2020-11-30 um 12:22 p.m. schrieb Deucher, Alexander: >> [AMD Public Use] >> >> >> We need to figure out what the root cause is then. If we can't figure >> it out soon, we should revert the change for navi1x and continue to >> debug it until we can find the root cause and we can safely re-enable it. >> >> Alex >> ------------------------------------------------------------------------ >> *From:* Chen, Guchun <Guchun.Chen@xxxxxxx> >> *Sent:* Sunday, November 29, 2020 2:22 AM >> *To:* Bas Nieuwenhuizen <bas@xxxxxxxxxxxxxxxxxxx>; Kuehling, Felix >> <Felix.Kuehling@xxxxxxx> >> *Cc:* Gui, Jack <Jack.Gui@xxxxxxx>; Zhou1, Tao <Tao.Zhou1@xxxxxxx>; >> amd-gfx mailing list <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>; Huang, Ray >> <Ray.Huang@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; >> Zhang, Hawking <Hawking.Zhang@xxxxxxx> >> *Subject:* RE: [PATCH v3] drm/amd/amdgpu: set the default value of >> noretry to 1 for some dGPUs >> >> [AMD Public Use] >> >> Hi Bas Nieuwenhuizen, >> >> I don't think direct revert is one right approach, though it's able to >> fix your problem. noretry=0 will cause other test failure on several >> ASICs. >> >> Regards, >> Guchun >> >> -----Original Message----- >> From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Bas >> Nieuwenhuizen >> Sent: Sunday, November 29, 2020 8:38 AM >> To: Kuehling, Felix <Felix.Kuehling@xxxxxxx> >> Cc: Gui, Jack <Jack.Gui@xxxxxxx>; Chen, Guchun <Guchun.Chen@xxxxxxx>; >> Zhou1, Tao <Tao.Zhou1@xxxxxxx>; amd-gfx mailing list >> <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>; Huang, Ray <Ray.Huang@xxxxxxx>; >> Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Zhang, Hawking >> <Hawking.Zhang@xxxxxxx> >> Subject: Re: [PATCH v3] drm/amd/amdgpu: set the default value of >> noretry to 1 for some dGPUs >> >> Can we revert this patch to fix >> https://gitlab.freedesktop.org/drm/amd/-/issues/1374 ? >> >> On Thu, Oct 15, 2020 at 4:30 PM Felix Kuehling >> <felix.kuehling@xxxxxxx> wrote: >>> Am 2020-10-14 um 11:35 p.m. schrieb Chengming Gui: >>>> noretry = 0 cause some dGPU's kfd page fault tests fail, so set >>>> noretry to 1 for these special ASICs: >>>> vega20/navi10/navi14/ARCTURUS >>>> >>>> v2: merge raven and default case due to the same setting >>>> v3: remove ARCTURUS >>>> >>>> Signed-off-by: Chengming Gui <Jack.Gui@xxxxxxx> >>>> Change-Id: I3be70f463a49b0cd5c56456431d6c2cb98b13872 >>> Acked-by: Felix Kuhling <Felix.Kuehling@xxxxxxx> >>> >>> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 23 >>>> +++++++++++++++-------- >>>> 1 file changed, 15 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c >>>> index 36604d751d62..f26eb4e54b12 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c >>>> @@ -425,20 +425,27 @@ void amdgpu_gmc_noretry_set(struct >> amdgpu_device *adev) >>>> struct amdgpu_gmc *gmc = &adev->gmc; >>>> >>>> switch (adev->asic_type) { >>>> - case CHIP_RAVEN: >>>> - /* Raven currently has issues with noretry >>>> - * regardless of what we decide for other >>>> - * asics, we should leave raven with >>>> - * noretry = 0 until we root cause the >>>> - * issues. >>>> + case CHIP_VEGA20: >>>> + case CHIP_NAVI10: >>>> + case CHIP_NAVI14: >>>> + /* >>>> + * noretry = 0 will cause kfd page fault tests fail >>>> + * for some ASICs, so set default to 1 for these ASICs. >>>> */ >>>> if (amdgpu_noretry == -1) >>>> - gmc->noretry = 0; >>>> + gmc->noretry = 1; >>>> else >>>> gmc->noretry = amdgpu_noretry; >>>> break; >>>> + case CHIP_RAVEN: >>>> default: >>>> - /* default this to 0 for now, but we may want >>>> + /* Raven currently has issues with noretry >>>> + * regardless of what we decide for other >>>> + * asics, we should leave raven with >>>> + * noretry = 0 until we root cause the >>>> + * issues. >>>> + * >>>> + * default this to 0 for now, but we may want >>>> * to change this in the future for certain >>>> * GPUs as it can increase performance in >>>> * certain cases. >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>> https://list/ <https://list> >>> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Cgu >>> chun.chen%40amd.com%7C6d626e2a3bae4877024f08d893ff15db%7C3dd8961fe4884 >>> e608e11a82d994e183d%7C0%7C0%7C637422071085800476%7CUnknown%7CTWFpbGZsb >>> 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D% >>> 7C1000&sdata=VFqegGwPCj10q3Y5BdZsVq2a%2B4Tb358mYVDaNkA9zLU%3D& >>> reserved=0 >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > _______________________________________________ > amd-gfx mailing list > amd-gfx@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx