[AMD Official Use Only - General] Tested-by: Bokun, Zhang <Bokun.Zhang@xxxxxxx> This patch is better since it extracted the unset code and only execute it in the SRIOV routine. I have tested it with multi-VF. Thanks! -----Original Message----- From: Alex Deucher <alexdeucher@xxxxxxxxx> Sent: Thursday, October 6, 2022 3:56 PM To: Zhang, Bokun <Bokun.Zhang@xxxxxxx> Cc: Liu, Monk <Monk.Liu@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Deng, Emily <Emily.Deng@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Subject: Re: [PATCH] drm/amdgpu: Fix SDMA engine resume issue under SRIOV On Thu, Oct 6, 2022 at 2:11 PM Zhang, Bokun <Bokun.Zhang@xxxxxxx> wrote: > > [AMD Official Use Only - General] > > Hey guys, > Please help review this patch for the suspend and resume issue. > I have tested it with multi-VF environment, I think it is ok. Seems a little hacky, but I think that's the least intrusive for stable. How about the attached patches? Alex > > Thanks! > > -----Original Message----- > From: Bokun Zhang <Bokun.Zhang@xxxxxxx> > Sent: Thursday, October 6, 2022 2:09 PM > To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Zhang, Bokun <Bokun.Zhang@xxxxxxx> > Subject: [PATCH] drm/amdgpu: Fix SDMA engine resume issue under SRIOV > > - Under SRIOV, SDMA engine is shared between VFs. Therefore, > we will not stop SDMA during hw_fini. This is not an issue > with normal dirver loading and unloading. > > - However, when we put the SDMA engine to suspend state and resume > it, the issue starts to show up. Something could attempt to use > that SDMA engine to clear or move memory before the engine is > initialized since the DRM entity is still there. > > - Therefore, we will call sdma_v5_2_enable(false) during hw_fini, > and if we are under SRIOV, we will call sdma_v5_2_enable(true) > afterwards to allow other VFs to use SDMA. This way, the DRM > entity of SDMA engine is emptied and it will follow the flow > of resume code path. > > Signed-off-by: Bokun Zhang <Bokun.Zhang@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c > b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c > index f136fec7b4f4..3eaf1a573e73 100644 > --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c > +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c > @@ -1357,12 +1357,19 @@ static int sdma_v5_2_hw_fini(void *handle) { > struct amdgpu_device *adev = (struct amdgpu_device *)handle; > > - if (amdgpu_sriov_vf(adev)) > - return 0; > - > + /* > + * Under SRIOV, the VF cannot single-mindedly stop SDMA engine > + * However, we still need to clean up the DRM entity > + * Therefore, we will re-enable SDMA afterwards. > + */ > sdma_v5_2_ctx_switch_enable(adev, false); > sdma_v5_2_enable(adev, false); > > + if (amdgpu_sriov_vf(adev)) { > + sdma_v5_2_enable(adev, true); > + sdma_v5_2_ctx_switch_enable(adev, true); > + } > + > return 0; > } > > -- > 2.34.1