Re: [PATCH] drm/amdgpu: clean up psp ip if hw_init failed v2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 22, 2022 at 3:54 PM Wong, Alice <Shiwei.Wong@xxxxxxx> wrote:
>
> [AMD Official Use Only]
>
> Hi Alex,
>
> The attached patch freed most of the allocated memory except for one allocated by psp_tmr_init during psp_load_fw.
> Combination of the attached patch and calling psp_hw_fini when psp_hw_init failed would fix the issue.
>
> As a proper fix, we can call psp_tmr_terminate in psp_load_fw when psp_load_non_psp_fw failed. (attached patch)
> We can't move psp_tmr_init to sw_init because we need to load toc to get the TMR size.
> Do you have any concerns with this approach?

That only covers failures in hw_init().  It doesn't handle resume().
Looks like all of the ta functions also potentially leak.  I'm working
on a cleanup to handle all of these.

Alex

>
> Alice
>
>
> -----Original Message-----
> From: Alex Deucher <alexdeucher@xxxxxxxxx>
> Sent: April 21, 2022 1:31 AM
> To: Wong, Alice <Shiwei.Wong@xxxxxxx>
> Cc: amd-gfx list <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
> Subject: Re: [PATCH] drm/amdgpu: clean up psp ip if hw_init failed v2
>
> On Wed, Apr 20, 2022 at 5:48 PM Alice Wong <shiwei.wong@xxxxxxx> wrote:
> >
> > If at any point psp_hw_init failed, psp_hw_fini would not be called
> > during unload due to ip_blocks[PSP].status.hw not being set to true.
> > This could cause a memory leak when the driver unloads.
> > As a rule of thumb, each IP block should cleanup themselves when their
> > hw_init fails. Only previously intialized blocks should be cleaned up
> > by the common framework.
> >
> > v1: Call IP blocks' respective hw_fini when hw_init failed from
> >     the common framework
> > v2: Call psp_hw_fini when psp_hw_init failed.
> >
> > Signed-off-by: Alice Wong <shiwei.wong@xxxxxxx>
>
> I don't think this is a good idea.  The hw programming in hw_fini makes no sense if the driver never set it up in the first place if hw_init failed before initializing the hw.  It would be better to just properly unwind the relevant functions.  Presumably the problem you are seeing is the failure to free the GPU memory allocated in fw_fini, depending on where it fails.  We should just allocate the memory in sw_init; that is what we do in other IPs.  Does the attached patch fix the issue you are seeing?
>
> Alex
>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 57
> > +++++++++++++------------
> >  1 file changed, 29 insertions(+), 28 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > index 5b9e48d44f5b..52b14efa848a 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > @@ -2537,6 +2537,34 @@ static int psp_load_fw(struct amdgpu_device *adev)
> >         return ret;
> >  }
> >
> > +static int psp_hw_fini(void *handle)
> > +{
> > +       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > +       struct psp_context *psp = &adev->psp;
> > +
> > +       if (psp->ta_fw) {
> > +               psp_ras_terminate(psp);
> > +               psp_securedisplay_terminate(psp);
> > +               psp_rap_terminate(psp);
> > +               psp_dtm_terminate(psp);
> > +               psp_hdcp_terminate(psp);
> > +       }
> > +
> > +       psp_asd_terminate(psp);
> > +
> > +       psp_tmr_terminate(psp);
> > +       psp_ring_destroy(psp, PSP_RING_TYPE__KM);
> > +
> > +       amdgpu_bo_free_kernel(&psp->fw_pri_bo,
> > +                             &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
> > +       amdgpu_bo_free_kernel(&psp->fence_buf_bo,
> > +                             &psp->fence_buf_mc_addr, &psp->fence_buf);
> > +       amdgpu_bo_free_kernel(&psp->cmd_buf_bo, &psp->cmd_buf_mc_addr,
> > +                             (void **)&psp->cmd_buf_mem);
> > +
> > +       return 0;
> > +}
> > +
> >  static int psp_hw_init(void *handle)
> >  {
> >         int ret;
> > @@ -2563,37 +2591,10 @@ static int psp_hw_init(void *handle)
> >  failed:
> >         adev->firmware.load_type = AMDGPU_FW_LOAD_DIRECT;
> >         mutex_unlock(&adev->firmware.mutex);
> > +       psp_hw_fini(handle);
> >         return -EINVAL;
> >  }
> >
> > -static int psp_hw_fini(void *handle)
> > -{
> > -       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > -       struct psp_context *psp = &adev->psp;
> > -
> > -       if (psp->ta_fw) {
> > -               psp_ras_terminate(psp);
> > -               psp_securedisplay_terminate(psp);
> > -               psp_rap_terminate(psp);
> > -               psp_dtm_terminate(psp);
> > -               psp_hdcp_terminate(psp);
> > -       }
> > -
> > -       psp_asd_terminate(psp);
> > -
> > -       psp_tmr_terminate(psp);
> > -       psp_ring_destroy(psp, PSP_RING_TYPE__KM);
> > -
> > -       amdgpu_bo_free_kernel(&psp->fw_pri_bo,
> > -                             &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
> > -       amdgpu_bo_free_kernel(&psp->fence_buf_bo,
> > -                             &psp->fence_buf_mc_addr, &psp->fence_buf);
> > -       amdgpu_bo_free_kernel(&psp->cmd_buf_bo, &psp->cmd_buf_mc_addr,
> > -                             (void **)&psp->cmd_buf_mem);
> > -
> > -       return 0;
> > -}
> > -
> >  static int psp_suspend(void *handle)
> >  {
> >         int ret;
> > --
> > 2.25.1
> >




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux