[AMD Official Use Only - General] Thanks for the info, Hawking. Yes, I just recalled Horatio had a solution in gfx11 to fix such warnings. I will provide patch set v2 to handle gfx v9 only. Regards, Guchun > -----Original Message----- > From: Zhang, Hawking <Hawking.Zhang@xxxxxxx> > Sent: Monday, May 8, 2023 10:23 AM > To: Zhou1, Tao <Tao.Zhou1@xxxxxxx>; Chen, Guchun > <Guchun.Chen@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Deucher, > Alexander <Alexander.Deucher@xxxxxxx>; Lazar, Lijo > <Lijo.Lazar@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; > Zhang, Horatio <Hongkun.Zhang@xxxxxxx> > Subject: RE: [PATCH] drm/amdgpu/gfx: disable cp_ecc_error_irq only when > gfx ras is enabled in suspend > > [AMD Official Use Only - General] > > Add @Zhang, Horatio > > Gfx11 should be addressed by Horatio's patch, not sure he committed yet. > The solution is retiring cp_ecc_irq funcs since gfx11 doesn't rely on the irq for > any software ras feature. > > Gfx9 could still add RAS block check since we have legacy ras feature that > needs the interrupt. > > Hi Horatio, > > Did you commit your fix yet? > > Regards, > Hawking > > -----Original Message----- > From: Zhou1, Tao <Tao.Zhou1@xxxxxxx> > Sent: Monday, May 8, 2023 10:16 > To: Chen, Guchun <Guchun.Chen@xxxxxxx>; amd- > gfx@xxxxxxxxxxxxxxxxxxxxx; Deucher, Alexander > <Alexander.Deucher@xxxxxxx>; Zhang, Hawking > <Hawking.Zhang@xxxxxxx>; Lazar, Lijo <Lijo.Lazar@xxxxxxx>; Koenig, > Christian <Christian.Koenig@xxxxxxx> > Subject: RE: [PATCH] drm/amdgpu/gfx: disable cp_ecc_error_irq only when > gfx ras is enabled in suspend > > [AMD Official Use Only - General] > > Reviewed-by: Tao Zhou <tao.zhou1@xxxxxxx> > > > -----Original Message----- > > From: Chen, Guchun <Guchun.Chen@xxxxxxx> > > Sent: Saturday, May 6, 2023 8:16 PM > > To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Deucher, Alexander > > <Alexander.Deucher@xxxxxxx>; Zhang, Hawking > <Hawking.Zhang@xxxxxxx>; > > Lazar, Lijo <Lijo.Lazar@xxxxxxx>; Zhou1, Tao <Tao.Zhou1@xxxxxxx>; > > Koenig, Christian <Christian.Koenig@xxxxxxx> > > Cc: Chen, Guchun <Guchun.Chen@xxxxxxx> > > Subject: [PATCH] drm/amdgpu/gfx: disable cp_ecc_error_irq only when > > gfx ras is enabled in suspend > > > > cp_ecc_error_irq is only enabled when gfx ras is assert. > > So in gfx_v9_0_hw_fini, interrupt disablement for cp_ecc_error_irq > > should be executed under such condition, otherwise, an amdgpu_irq_put > > calltrace will occur. > > > > [ 7283.170322] RIP: 0010:amdgpu_irq_put+0x45/0x70 [amdgpu] [ > > 7283.170964] > > RSP: 0018:ffff9a5fc3967d00 EFLAGS: 00010246 [ 7283.170967] RAX: > > ffff98d88afd3040 RBX: ffff98d89da20000 RCX: 0000000000000000 [ > > 7283.170969] RDX: 0000000000000000 RSI: ffff98d89da2bef8 RDI: > > ffff98d89da20000 [ 7283.170971] RBP: ffff98d89da20000 R08: > > ffff98d89da2ca18 R09: 0000000000000006 [ 7283.170973] R10: > > ffffd5764243c008 R11: 0000000000000000 R12: 0000000000001050 [ > > 7283.170975] R13: ffff98d89da38978 R14: ffffffff999ae15a R15: > > ffff98d880130105 [ 7283.170978] FS: 0000000000000000(0000) > > GS:ffff98d996f00000(0000) knlGS:0000000000000000 [ 7283.170981] CS: > > 0010 > > DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7283.170983] CR2: > > 00000000f7a9d178 CR3: 00000001c42ea000 CR4: 00000000003506e0 [ > > 7283.170986] Call Trace: > > [ 7283.170988] <TASK> > > [ 7283.170989] gfx_v9_0_hw_fini+0x1c/0x6d0 [amdgpu] [ 7283.171655] > > amdgpu_device_ip_suspend_phase2+0x101/0x1a0 [amdgpu] > [ 7283.172245] > > amdgpu_device_suspend+0x103/0x180 [amdgpu] [ 7283.172823] > > amdgpu_pmops_freeze+0x21/0x60 [amdgpu] [ 7283.173412] > > pci_pm_freeze+0x54/0xc0 [ 7283.173419] ? > > __pfx_pci_pm_freeze+0x10/0x10 [ 7283.173425] > > dpm_run_callback+0x98/0x200 [ 7283.173430] > > __device_suspend+0x164/0x5f0 > > > > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 > > > > Signed-off-by: Guchun Chen <guchun.chen@xxxxxxx> > > --- > > drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 3 ++- > > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 ++- > > 2 files changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > > b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > > index ecf8ceb53311..f6bc62a94099 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > > @@ -4442,7 +4442,8 @@ static int gfx_v11_0_hw_fini(void *handle) > > struct amdgpu_device *adev = (struct amdgpu_device *)handle; > > int r; > > > > - amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0); > > + if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX)) > > + amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0); > > amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0); > > amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0); > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > > index ae09fc1cfe6b..c54d05bdc2d8 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > > @@ -3751,7 +3751,8 @@ static int gfx_v9_0_hw_fini(void *handle) { > > struct amdgpu_device *adev = (struct amdgpu_device *)handle; > > > > - amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0); > > + if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX)) > > + amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0); > > amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0); > > amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0); > > > > -- > > 2.25.1