Re: [PATCH] drm/amdgpu: Fix page table setup on Arcturus

Alex Deucher <alexdeucher@xxxxxxxxx> · Thu, 25 Aug 2022 11:25:43 -0400

On Thu, Aug 25, 2022 at 10:49 AM Joshi, Mukul <Mukul.Joshi@xxxxxxx> wrote:
>
> [AMD Official Use Only - General]
>
>
>
> > -----Original Message-----
> > From: Alex Deucher <alexdeucher@xxxxxxxxx>
> > Sent: Thursday, August 25, 2022 9:33 AM
> > To: Joshi, Mukul <Mukul.Joshi@xxxxxxx>
> > Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> > Subject: Re: [PATCH] drm/amdgpu: Fix page table setup on Arcturus
> >
> > [CAUTION: External Email]
> >
> > On Mon, Aug 22, 2022 at 11:53 AM Mukul Joshi <mukul.joshi@xxxxxxx>
> > wrote:
> > >
> > > When translate_further is enabled, page table depth needs to be
> > > updated. This was missing on Arcturus MMHUB init. This was causing
> > > address translations to fail for SDMA user-mode queues.
> > >
> >
> > Do other mmhub implementations need a similar fix?  It looks like some of
> > them are missing similar changes.
> >
>
> I am not sure if there is a plan to enable translate_further on other ASICs.
> For now, its only enabled for Arcturus, Aldebaran and Raven.
> If we plan to enable it on other ASICs, then yes the other mmhub implementations
> would need similar changes.

It would be nice to fix them up preemptively so that if we ever enable
it on another asic, it will just work.

Alex

>
> Regards,
> Mukul
>
> > Alex
> >
> > > Fixes: 2abf2573b1c69 ("drm/amdgpu: Enable translate_further to extend
> > UTCL2 reach"
> > > Signed-off-by: Mukul Joshi <mukul.joshi@xxxxxxx>
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c | 12 ++++++++++--
> > >  1 file changed, 10 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c
> > > b/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c
> > > index 6e0145b2b408..445cb06b9d26 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c
> > > @@ -295,9 +295,17 @@ static void
> > > mmhub_v9_4_disable_identity_aperture(struct amdgpu_device *adev,
> > > static void mmhub_v9_4_setup_vmid_config(struct amdgpu_device
> > *adev, int hubid)  {
> > >         struct amdgpu_vmhub *hub = &adev->vmhub[AMDGPU_MMHUB_0];
> > > +       unsigned int num_level, block_size;
> > >         uint32_t tmp;
> > >         int i;
> > >
> > > +       num_level = adev->vm_manager.num_level;
> > > +       block_size = adev->vm_manager.block_size;
> > > +       if (adev->gmc.translate_further)
> > > +               num_level -= 1;
> > > +       else
> > > +               block_size -= 9;
> > > +
> > >         for (i = 0; i <= 14; i++) {
> > >                 tmp = RREG32_SOC15_OFFSET(MMHUB, 0,
> > mmVML2VC0_VM_CONTEXT1_CNTL,
> > >                                 hubid * MMHUB_INSTANCE_REGISTER_OFFSET
> > > + i); @@ -305,7 +313,7 @@ static void
> > mmhub_v9_4_setup_vmid_config(struct amdgpu_device *adev, int hubid)
> > >                                     ENABLE_CONTEXT, 1);
> > >                 tmp = REG_SET_FIELD(tmp, VML2VC0_VM_CONTEXT1_CNTL,
> > >                                     PAGE_TABLE_DEPTH,
> > > -                                   adev->vm_manager.num_level);
> > > +                                   num_level);
> > >                 tmp = REG_SET_FIELD(tmp, VML2VC0_VM_CONTEXT1_CNTL,
> > >                                     RANGE_PROTECTION_FAULT_ENABLE_DEFAULT, 1);
> > >                 tmp = REG_SET_FIELD(tmp, VML2VC0_VM_CONTEXT1_CNTL, @@
> > > -323,7 +331,7 @@ static void mmhub_v9_4_setup_vmid_config(struct
> > amdgpu_device *adev, int hubid)
> > >                                     EXECUTE_PROTECTION_FAULT_ENABLE_DEFAULT, 1);
> > >                 tmp = REG_SET_FIELD(tmp, VML2VC0_VM_CONTEXT1_CNTL,
> > >                                     PAGE_TABLE_BLOCK_SIZE,
> > > -                                   adev->vm_manager.block_size - 9);
> > > +                                   block_size);
> > >                 /* Send no-retry XNACK on fault to suppress VM fault storm. */
> > >                 tmp = REG_SET_FIELD(tmp, VML2VC0_VM_CONTEXT1_CNTL,
> > >
> > > RETRY_PERMISSION_OR_INVALID_PAGE_FAULT,
> > > --
> > > 2.35.1
> > >