On Tue, Aug 16, 2022 at 10:54 PM Chai, Thomas <YiPeng.Chai@xxxxxxx> wrote: > > [AMD Official Use Only - General] > > Hi Alex: > When removing an amdgpu device, it may be difficult to change the order of psp_hw_fini calls. > > 1. The drm_dev_unplug call is at the beginning in the amdgpu_pci_remove function, which makes the gpu device inaccessible for userspace operations. If the call to psp_hw_fini was moved before drm_dev_unplug, userspace could access the gpu device but the psp might be removing. It has unknown issues. > +Andrey Grodzovsky We should fix the ordering in amdgpu_pci_remove() then I guess? There are lots of places where drm_dev_enter() is used to protect access to the hardware which could be similarly affected. Alex > 2. psp_hw_fini is called by the .hw_fini iterator in amdgpu_device_ip_fini_early, referring to the code starting from amdgpu_pci_remove to .hw_fini is called, > there are many preparatory operations before calling .hw_fini, which makes it very difficult to change the order of psp_hw_fini or all block .hw_fini. > > So can we do a workaround in psp_cmd_submit_buf when removing amdgpu device? > > -----Original Message----- > From: Alex Deucher <alexdeucher@xxxxxxxxx> > Sent: Monday, August 15, 2022 10:22 PM > To: Chai, Thomas <YiPeng.Chai@xxxxxxx> > Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Chen, Guchun <Guchun.Chen@xxxxxxx>; Chai, Thomas <YiPeng.Chai@xxxxxxx> > Subject: Re: [PATCH] drm/amdgpu: TA unload messages are not actually sent to psp when amdgpu is uninstalled > > On Mon, Aug 15, 2022 at 3:06 AM YiPeng Chai <YiPeng.Chai@xxxxxxx> wrote: > > > > The psp_cmd_submit_buf function is called by psp_hw_fini to send TA > > unload messages to psp to terminate ras, asd and tmr. > > But when amdgpu is uninstalled, drm_dev_unplug is called earlier than > > psp_hw_fini in amdgpu_pci_remove, the calling order as follows: > > static void amdgpu_pci_remove(struct pci_dev *pdev) { > > drm_dev_unplug > > ...... > > amdgpu_driver_unload_kms->amdgpu_device_fini_hw->... > > ->.hw_fini->psp_hw_fini->... > > ->psp_ta_unload->psp_cmd_submit_buf > > ...... > > } > > The program will return when calling drm_dev_enter in > > psp_cmd_submit_buf. > > > > So the call to drm_dev_enter in psp_cmd_submit_buf should be removed, > > so that the TA unload messages can be sent to the psp when amdgpu is > > uninstalled. > > > > Signed-off-by: YiPeng Chai <YiPeng.Chai@xxxxxxx> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 4 ---- > > 1 file changed, 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > index b067ce45d226..0578d8d094a7 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > @@ -585,9 +585,6 @@ psp_cmd_submit_buf(struct psp_context *psp, > > if (psp->adev->no_hw_access) > > return 0; > > > > - if (!drm_dev_enter(adev_to_drm(psp->adev), &idx)) > > - return 0; > > - > > This check is to prevent the hardware from being accessed if the card is removed. I think we need to fix the ordering elsewhere. > > Alex > > > memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE); > > > > memcpy(psp->cmd_buf_mem, cmd, sizeof(struct > > psp_gfx_cmd_resp)); @@ -651,7 +648,6 @@ psp_cmd_submit_buf(struct psp_context *psp, > > } > > > > exit: > > - drm_dev_exit(idx); > > return ret; > > } > > > > -- > > 2.25.1 > >