Re: [PATCH v3] drm/amd/amdgpu: set the default value of noretry to 1 for some dGPUs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Public Use]


We need to figure out what the root cause is then.  If we can't figure it out soon, we should revert the change for navi1x and continue to debug it until we can find the root cause and we can safely re-enable it.

Alex

From: Chen, Guchun <Guchun.Chen@xxxxxxx>
Sent: Sunday, November 29, 2020 2:22 AM
To: Bas Nieuwenhuizen <bas@xxxxxxxxxxxxxxxxxxx>; Kuehling, Felix <Felix.Kuehling@xxxxxxx>
Cc: Gui, Jack <Jack.Gui@xxxxxxx>; Zhou1, Tao <Tao.Zhou1@xxxxxxx>; amd-gfx mailing list <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>; Huang, Ray <Ray.Huang@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Zhang, Hawking <Hawking.Zhang@xxxxxxx>
Subject: RE: [PATCH v3] drm/amd/amdgpu: set the default value of noretry to 1 for some dGPUs
 
[AMD Public Use]

Hi Bas Nieuwenhuizen,

I don't think direct revert is one right approach, though it's able to fix your problem.  noretry=0 will cause other test failure on several ASICs.

Regards,
Guchun

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Bas Nieuwenhuizen
Sent: Sunday, November 29, 2020 8:38 AM
To: Kuehling, Felix <Felix.Kuehling@xxxxxxx>
Cc: Gui, Jack <Jack.Gui@xxxxxxx>; Chen, Guchun <Guchun.Chen@xxxxxxx>; Zhou1, Tao <Tao.Zhou1@xxxxxxx>; amd-gfx mailing list <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>; Huang, Ray <Ray.Huang@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Zhang, Hawking <Hawking.Zhang@xxxxxxx>
Subject: Re: [PATCH v3] drm/amd/amdgpu: set the default value of noretry to 1 for some dGPUs

Can we revert this patch to fix
https://nam11.safelinks.protection.outlook.com/?url=""> ?

On Thu, Oct 15, 2020 at 4:30 PM Felix Kuehling <felix.kuehling@xxxxxxx> wrote:
>
> Am 2020-10-14 um 11:35 p.m. schrieb Chengming Gui:
> > noretry = 0 cause some dGPU's kfd page fault tests fail, so set
> > noretry to 1 for these special ASICs:
> > vega20/navi10/navi14/ARCTURUS
> >
> > v2: merge raven and default case due to the same setting
> > v3: remove ARCTURUS
> >
> > Signed-off-by: Chengming Gui <Jack.Gui@xxxxxxx>
> > Change-Id: I3be70f463a49b0cd5c56456431d6c2cb98b13872
>
> Acked-by: Felix Kuhling <Felix.Kuehling@xxxxxxx>
>
>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 23
> > +++++++++++++++--------
> >  1 file changed, 15 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > index 36604d751d62..f26eb4e54b12 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > @@ -425,20 +425,27 @@ void amdgpu_gmc_noretry_set(struct amdgpu_device *adev)
> >       struct amdgpu_gmc *gmc = &adev->gmc;
> >
> >       switch (adev->asic_type) {
> > -     case CHIP_RAVEN:
> > -             /* Raven currently has issues with noretry
> > -              * regardless of what we decide for other
> > -              * asics, we should leave raven with
> > -              * noretry = 0 until we root cause the
> > -              * issues.
> > +     case CHIP_VEGA20:
> > +     case CHIP_NAVI10:
> > +     case CHIP_NAVI14:
> > +             /*
> > +              * noretry = 0 will cause kfd page fault tests fail
> > +              * for some ASICs, so set default to 1 for these ASICs.
> >                */
> >               if (amdgpu_noretry == -1)
> > -                     gmc->noretry = 0;
> > +                     gmc->noretry = 1;
> >               else
> >                       gmc->noretry = amdgpu_noretry;
> >               break;
> > +     case CHIP_RAVEN:
> >       default:
> > -             /* default this to 0 for now, but we may want
> > +             /* Raven currently has issues with noretry
> > +              * regardless of what we decide for other
> > +              * asics, we should leave raven with
> > +              * noretry = 0 until we root cause the
> > +              * issues.
> > +              *
> > +              * default this to 0 for now, but we may want
> >                * to change this in the future for certain
> >                * GPUs as it can increase performance in
> >                * certain cases.
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@xxxxxxxxxxxxxxxxxxxxx
>
https://nam11.safelinks.protection.outlook.com/?url="">
> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=""> > chun.chen%40amd.com%7C6d626e2a3bae4877024f08d893ff15db%7C3dd8961fe4884
> e608e11a82d994e183d%7C0%7C0%7C637422071085800476%7CUnknown%7CTWFpbGZsb
> 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> 7C1000&amp;sdata=VFqegGwPCj10q3Y5BdZsVq2a%2B4Tb358mYVDaNkA9zLU%3D&amp;
> reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://nam11.safelinks.protection.outlook.com/?url="">
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux