Re: [PATCH] drm/amdgpu: TA unload messages are not actually sent to psp when amdgpu is uninstalled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2022-08-17 09:44, Alex Deucher wrote:
On Tue, Aug 16, 2022 at 10:54 PM Chai, Thomas <YiPeng.Chai@xxxxxxx> wrote:
[AMD Official Use Only - General]

Hi Alex:
   When removing an amdgpu device, it may be difficult to change the order of psp_hw_fini calls.

1. The drm_dev_unplug call is at the beginning in the amdgpu_pci_remove function,  which makes the gpu device inaccessible for userspace operations.  If the call to psp_hw_fini was moved before drm_dev_unplug,  userspace could access the gpu device but the psp might be removing. It has unknown issues.

+Andrey Grodzovsky

We should fix the ordering in amdgpu_pci_remove() then I guess?  There
are lots of places where drm_dev_enter() is used to protect access to
the hardware which could be similarly affected.

Alex


We probably can try to move drm_dev_unplug after amdgpu_driver_unload_kms. I don't remember now why drm_dev_unplug must be the first thing we do in amdgpu_pci_remove and what impact it will have but maybe give it a try. Also see if you can run libdrm hotplug test suite before and after the change.

Andrey



2. psp_hw_fini is called by the .hw_fini iterator in amdgpu_device_ip_fini_early, referring to the code starting from amdgpu_pci_remove to .hw_fini is called,
    there are many preparatory operations before calling .hw_fini,  which makes it very difficult to change the order of psp_hw_fini or all block .hw_fini.

    So can we do a workaround in psp_cmd_submit_buf when removing amdgpu device?

-----Original Message-----
From: Alex Deucher <alexdeucher@xxxxxxxxx>
Sent: Monday, August 15, 2022 10:22 PM
To: Chai, Thomas <YiPeng.Chai@xxxxxxx>
Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Chen, Guchun <Guchun.Chen@xxxxxxx>; Chai, Thomas <YiPeng.Chai@xxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: TA unload messages are not actually sent to psp when amdgpu is uninstalled

On Mon, Aug 15, 2022 at 3:06 AM YiPeng Chai <YiPeng.Chai@xxxxxxx> wrote:
The psp_cmd_submit_buf function is called by psp_hw_fini to send TA
unload messages to psp to terminate ras, asd and tmr.
But when amdgpu is uninstalled, drm_dev_unplug is called earlier than
psp_hw_fini in amdgpu_pci_remove, the calling order as follows:
static void amdgpu_pci_remove(struct pci_dev *pdev) {
         drm_dev_unplug
         ......
         amdgpu_driver_unload_kms->amdgpu_device_fini_hw->...
                 ->.hw_fini->psp_hw_fini->...
                 ->psp_ta_unload->psp_cmd_submit_buf
         ......
}
The program will return when calling drm_dev_enter in
psp_cmd_submit_buf.

So the call to drm_dev_enter in psp_cmd_submit_buf should be removed,
so that the TA unload messages can be sent to the psp when amdgpu is
uninstalled.

Signed-off-by: YiPeng Chai <YiPeng.Chai@xxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 4 ----
  1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index b067ce45d226..0578d8d094a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -585,9 +585,6 @@ psp_cmd_submit_buf(struct psp_context *psp,
         if (psp->adev->no_hw_access)
                 return 0;

-       if (!drm_dev_enter(adev_to_drm(psp->adev), &idx))
-               return 0;
-
This check is to prevent the hardware from being accessed if the card is removed.  I think we need to fix the ordering elsewhere.

Alex

         memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE);

         memcpy(psp->cmd_buf_mem, cmd, sizeof(struct
psp_gfx_cmd_resp)); @@ -651,7 +648,6 @@ psp_cmd_submit_buf(struct psp_context *psp,
         }

  exit:
-       drm_dev_exit(idx);
         return ret;
  }

--
2.25.1




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux