[AMD Official Use Only - General] Hi Christian, Thank you for such a quick review. It seems that the gfx_v11_0_cp_ecc_error_irq_funcs and amdgpu_irq_put(adev, &adev->gmc.ecc_irq, 0) in gmc_v11_0_hw_fini are not needed anymore. I will update this fix in the next version of the patch. Thanks, Horatio -----Original Message----- From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx> Sent: Monday, April 24, 2023 6:50 PM To: Zhang, Horatio <Hongkun.Zhang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Yao, Longlong <Longlong.Yao@xxxxxxx>; Xu, Feifei <Feifei.Xu@xxxxxxx>; Chen, Guchun <Guchun.Chen@xxxxxxx>; Zhang, Hawking <Hawking.Zhang@xxxxxxx> Subject: Re: [PATCH] drm/amdgpu: fix amdgpu_irq_enabled warning in gfx and gmc hw_fini Am 24.04.23 um 12:37 schrieb Horatio Zhang: > The call trace occurred when the amdgpu is suspended before the mode1 > reset. For the IP block that do not support ras features, the relevant > interrupt is not opened during initialization, but hw_fini forced the > close of this interrupt, which resulted in amdgpu_irq_enabled check > warning.2 Sounds like it was a good idea to add this check. > > [ 102.873958] Call Trace: > [ 102.873959] <TASK> > [ 102.873961] gfx_v11_0_hw_fini+0x23/0x1e0 [amdgpu] [ 102.874019] > gfx_v11_0_suspend+0xe/0x20 [amdgpu] [ 102.874072] > amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [ 102.874122] > amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 102.874172] > amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [ 102.874223] > amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [ 102.874321] > amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 102.874375] > process_one_work+0x21f/0x3f0 [ 102.874377] worker_thread+0x200/0x3e0 > [ 102.874378] ? process_one_work+0x3f0/0x3f0 [ 102.874379] > kthread+0xfd/0x130 [ 102.874380] ? > kthread_complete_and_exit+0x20/0x20 > [ 102.874381] ret_from_fork+0x22/0x30 > > [ 102.980303] Call Trace: > [ 102.980303] <TASK> > [ 102.980304] gmc_v11_0_hw_fini+0x54/0x90 [amdgpu] [ 102.980357] > gmc_v11_0_suspend+0xe/0x20 [amdgpu] [ 102.980409] > amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [ 102.980459] > amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 102.980520] > amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [ 102.980573] > amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [ 102.980687] > amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 102.980740] > process_one_work+0x21f/0x3f0 [ 102.980741] worker_thread+0x200/0x3e0 > [ 102.980742] ? process_one_work+0x3f0/0x3f0 [ 102.980743] > kthread+0xfd/0x130 [ 102.980743] ? > kthread_complete_and_exit+0x20/0x20 > [ 102.980744] ret_from_fork+0x22/0x30 > > Signed-off-by: Horatio Zhang <Hongkun.Zhang@xxxxxxx> Assuming the corresponding _get() calls are already protected by the same check: Reviewed-by: Christian König <christian.koenig@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 3 ++- > drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 3 ++- > 2 files changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > index 543af07ff102..0f6b037558bc 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > @@ -4483,7 +4483,8 @@ static int gfx_v11_0_hw_fini(void *handle) > struct amdgpu_device *adev = (struct amdgpu_device *)handle; > int r; > > - amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0); > + if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX)) > + amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0); > amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0); > amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0); > > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c > b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c > index 3828ca95899f..25f466c26d18 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c > @@ -951,7 +951,8 @@ static int gmc_v11_0_hw_fini(void *handle) > return 0; > } > > - amdgpu_irq_put(adev, &adev->gmc.ecc_irq, 0); > + if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX)) > + amdgpu_irq_put(adev, &adev->gmc.ecc_irq, 0); > amdgpu_irq_put(adev, &adev->gmc.vm_fault, 0); > gmc_v11_0_gart_disable(adev); >