[AMD Official Use Only - General] Hey guys, Please help review this patch for the suspend and resume issue. I have tested it with multi-VF environment, I think it is ok. Thanks! -----Original Message----- From: Bokun Zhang <Bokun.Zhang@xxxxxxx> Sent: Thursday, October 6, 2022 2:09 PM To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Zhang, Bokun <Bokun.Zhang@xxxxxxx> Subject: [PATCH] drm/amdgpu: Fix SDMA engine resume issue under SRIOV - Under SRIOV, SDMA engine is shared between VFs. Therefore, we will not stop SDMA during hw_fini. This is not an issue with normal dirver loading and unloading. - However, when we put the SDMA engine to suspend state and resume it, the issue starts to show up. Something could attempt to use that SDMA engine to clear or move memory before the engine is initialized since the DRM entity is still there. - Therefore, we will call sdma_v5_2_enable(false) during hw_fini, and if we are under SRIOV, we will call sdma_v5_2_enable(true) afterwards to allow other VFs to use SDMA. This way, the DRM entity of SDMA engine is emptied and it will follow the flow of resume code path. Signed-off-by: Bokun Zhang <Bokun.Zhang@xxxxxxx> --- drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c index f136fec7b4f4..3eaf1a573e73 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c @@ -1357,12 +1357,19 @@ static int sdma_v5_2_hw_fini(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle; - if (amdgpu_sriov_vf(adev)) - return 0; - + /* + * Under SRIOV, the VF cannot single-mindedly stop SDMA engine + * However, we still need to clean up the DRM entity + * Therefore, we will re-enable SDMA afterwards. + */ sdma_v5_2_ctx_switch_enable(adev, false); sdma_v5_2_enable(adev, false); + if (amdgpu_sriov_vf(adev)) { + sdma_v5_2_enable(adev, true); + sdma_v5_2_ctx_switch_enable(adev, true); + } + return 0; } -- 2.34.1