[AMD Official Use Only - General] Hi Mario, Comments inline. Thanks. -----Original Message----- From: Limonciello, Mario <Mario.Limonciello@xxxxxxx> Sent: Monday, December 19, 2022 11:22 PM To: Huang, Tim <Tim.Huang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Zhang, Yifan <Yifan1.Zhang@xxxxxxx>; Ma, Li <Li.Ma@xxxxxxx>; Du, Xiaojian <Xiaojian.Du@xxxxxxx> Subject: Re: drm/amdgpu: skip mes self test after s0i3 resume for MES IP v11.0 On 12/19/2022 06:12, Tim Huang wrote: > MES is part of gfxoff for S0i3 and does not require self-test after S0i3. > Besides, self-test will free the BO that triggers a wraning while in > the suspend state. > > [ 81.656085] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu] > [ 81.679435] Call Trace: > [ 81.679726] <TASK> > [ 81.679981] amdgpu_mes_remove_hw_queue+0x17a/0x230 [amdgpu] > [ 81.680857] amdgpu_mes_self_test+0x390/0x430 [amdgpu] > [ 81.681665] mes_v11_0_late_init+0x37/0x50 [amdgpu] > [ 81.682423] amdgpu_device_ip_late_init+0x53/0x280 [amdgpu] > [ 81.683257] amdgpu_device_resume+0xae/0x2a0 [amdgpu] > [ 81.684043] amdgpu_pmops_resume+0x37/0x70 [amdgpu] > [ 81.684818] pci_pm_resume+0x5c/0xa0 > [ 81.685247] ? pci_pm_thaw+0x90/0x90 > [ 81.685658] dpm_run_callback+0x4e/0x160 > [ 81.686110] device_resume+0xad/0x210 > [ 81.686529] async_resume+0x1e/0x40 > [ 81.686931] async_run_entry_fn+0x33/0x120 > [ 81.687405] process_one_work+0x21d/0x3f0 > [ 81.687869] worker_thread+0x4a/0x3c0 > [ 81.688293] ? process_one_work+0x3f0/0x3f0 > [ 81.688777] kthread+0xff/0x130 > [ 81.689157] ? kthread_complete_and_exit+0x20/0x20 > [ 81.689707] ret_from_fork+0x22/0x30 > [ 81.690118] </TASK> > [ 81.690380] ---[ end trace 0000000000000000 ]--- Is this still needed with https://patchwork.freedesktop.org/patch/515278/ ? Patch 515278 skipped the MES suspend and resume, But the self-test stilled be called by ip late init. Please get detail for patch v2. > > Signed-off-by: Tim Huang <tim.huang@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c > b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c > index 5459366f49ff..80e8cf826e71 100644 > --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c > @@ -1342,7 +1342,7 @@ static int mes_v11_0_late_init(void *handle) > { > struct amdgpu_device *adev = (struct amdgpu_device *)handle; > > - if (!amdgpu_in_reset(adev) && > + if (!amdgpu_in_reset(adev) && !adev->in_suspend && I think in this case you should be using adev->in_s0ix instead. Yes, adev->in_s0ix should be better, thanks for pointing that out. > (adev->ip_versions[GC_HWIP][0] != IP_VERSION(11, 0, 3))) > amdgpu_mes_self_test(adev); >