RE: [PATCH] drm/amdgpu: fix amdgpu_irq_enabled warning in gfx and gmc hw_fini

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Official Use Only - General]

Hi Hawking,

Thank you for such a quick review, I think your suggestion is a better choice for solving this issue.
I have updated the fix patch about this issue, could you please help me review it

[PATCH v2 1/3] drm/amdgpu: gfx_v11_0_cp_ecc_error_irq_funcs is not needed any more
[PATCH v2 2/3] drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v11_0_hw_fini
[PATCH v2 3/3] drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v10_0_hw_fini

Thanks,
Horatio

-----Original Message-----
From: Zhang, Hawking <Hawking.Zhang@xxxxxxx> 
Sent: Monday, April 24, 2023 7:28 PM
To: Zhang, Horatio <Hongkun.Zhang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Chen, Guchun <Guchun.Chen@xxxxxxx>; Xu, Feifei <Feifei.Xu@xxxxxxx>; Yao, Longlong <Longlong.Yao@xxxxxxx>; Zhang, Horatio <Hongkun.Zhang@xxxxxxx>
Subject: RE: [PATCH] drm/amdgpu: fix amdgpu_irq_enabled warning in gfx and gmc hw_fini

[AMD Official Use Only - General]

Let's handle umc and gfx ras cases in separated patch.

gmc.ecc_irq maps to df mca interrupt, which is enabled by firmware per IFWI setting. Check GFX RAS mask is not correct in such case. host driver is not privileged to enable/disable the interrupt. As you noticed already the set function is dummy one. then just drop amdgpu_irq_put call for gmc.ecc_irq should be fine.

gfx.cp_ecc_error_irq is not used in gfx11. Feel free to retire the interrupt for gfx11 in another patch (i.e., gfx_v11_0_cp_ecc_error_irq_funcs is not needed any more)

Regards,
Hawking

-----Original Message-----
From: Horatio Zhang <Hongkun.Zhang@xxxxxxx>
Sent: Monday, April 24, 2023 18:37
To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Chen, Guchun <Guchun.Chen@xxxxxxx>; Xu, Feifei <Feifei.Xu@xxxxxxx>; Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Yao, Longlong <Longlong.Yao@xxxxxxx>; Zhang, Horatio <Hongkun.Zhang@xxxxxxx>
Subject: [PATCH] drm/amdgpu: fix amdgpu_irq_enabled warning in gfx and gmc hw_fini

The call trace occurred when the amdgpu is suspended before the mode1 reset. For the IP block that do not support ras features, the relevant interrupt is not opened during initialization, but hw_fini forced the close of this interrupt, which resulted in amdgpu_irq_enabled check warning.

[  102.873958] Call Trace:
[  102.873959]  <TASK>
[  102.873961]  gfx_v11_0_hw_fini+0x23/0x1e0 [amdgpu] [  102.874019]  gfx_v11_0_suspend+0xe/0x20 [amdgpu] [  102.874072]  amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [  102.874122]  amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [  102.874172]  amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [  102.874223]  amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [  102.874321]  amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [  102.874375]  process_one_work+0x21f/0x3f0 [  102.874377]  worker_thread+0x200/0x3e0 [  102.874378]  ? process_one_work+0x3f0/0x3f0 [  102.874379]  kthread+0xfd/0x130 [  102.874380]  ? kthread_complete_and_exit+0x20/0x20
[  102.874381]  ret_from_fork+0x22/0x30

[  102.980303] Call Trace:
[  102.980303]  <TASK>
[  102.980304]  gmc_v11_0_hw_fini+0x54/0x90 [amdgpu] [  102.980357]  gmc_v11_0_suspend+0xe/0x20 [amdgpu] [  102.980409]  amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [  102.980459]  amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [  102.980520]  amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [  102.980573]  amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [  102.980687]  amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [  102.980740]  process_one_work+0x21f/0x3f0 [  102.980741]  worker_thread+0x200/0x3e0 [  102.980742]  ? process_one_work+0x3f0/0x3f0 [  102.980743]  kthread+0xfd/0x130 [  102.980743]  ? kthread_complete_and_exit+0x20/0x20
[  102.980744]  ret_from_fork+0x22/0x30

Signed-off-by: Horatio Zhang <Hongkun.Zhang@xxxxxxx>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 3 ++-  drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 543af07ff102..0f6b037558bc 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -4483,7 +4483,8 @@ static int gfx_v11_0_hw_fini(void *handle)
        struct amdgpu_device *adev = (struct amdgpu_device *)handle;
        int r;

-       amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
+       if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX))
+               amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
        amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
        amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
index 3828ca95899f..25f466c26d18 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
@@ -951,7 +951,8 @@ static int gmc_v11_0_hw_fini(void *handle)
                return 0;
        }

-       amdgpu_irq_put(adev, &adev->gmc.ecc_irq, 0);
+       if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX))
+               amdgpu_irq_put(adev, &adev->gmc.ecc_irq, 0);
        amdgpu_irq_put(adev, &adev->gmc.vm_fault, 0);
        gmc_v11_0_gart_disable(adev);

--
2.34.1





[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux