[AMD Official Use Only - General] > -----Original Message----- > From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Horatio > Zhang > Sent: Monday, May 8, 2023 6:20 PM > To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Liu, HaoPing (Alan) <HaoPing.Liu@xxxxxxx>; Zhang, Horatio > <Hongkun.Zhang@xxxxxxx>; Xu, Feifei <Feifei.Xu@xxxxxxx>; Zhou1, Tao > <Tao.Zhou1@xxxxxxx>; Jiang, Sonny <Sonny.Jiang@xxxxxxx>; Limonciello, > Mario <Mario.Limonciello@xxxxxxx>; Liu, Leo <Leo.Liu@xxxxxxx>; Zhang, > Hawking <Hawking.Zhang@xxxxxxx> > Subject: [PATCH 2/2] drm/amdgpu: fix amdgpu_irq_put call trace in > vcn_v4_0_hw_fini > > During the suspend, the vcn_v4_0_hw_init function will use the amdgpu_irq_put > to disable the irq of vcn.inst, but it was not enabled during the resume process, > which resulted in a call trace during the GPU reset process. > > [ 44.563572] RIP: 0010:amdgpu_irq_put+0xa4/0xc0 [amdgpu] > [ 44.563629] RSP: 0018:ffffb36740edfc90 EFLAGS: 00010246 > [ 44.563630] RAX: 0000000000000000 RBX: 0000000000000001 RCX: > 0000000000000000 > [ 44.563630] RDX: 0000000000000000 RSI: 0000000000000000 RDI: > 0000000000000000 > [ 44.563631] RBP: ffffb36740edfcb0 R08: 0000000000000000 R09: > 0000000000000000 > [ 44.563631] R10: 0000000000000000 R11: 0000000000000000 R12: > ffff954c568e2ea8 > [ 44.563631] R13: 0000000000000000 R14: ffff954c568c0000 R15: > ffff954c568e2ea8 > [ 44.563632] FS: 0000000000000000(0000) GS:ffff954f584c0000(0000) > knlGS:0000000000000000 > [ 44.563632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 44.563633] CR2: 00007f028741ba70 CR3: 000000026ca10000 CR4: > 0000000000750ee0 > [ 44.563633] PKRU: 55555554 > [ 44.563633] Call Trace: > [ 44.563634] <TASK> > [ 44.563634] vcn_v4_0_hw_fini+0x62/0x160 [amdgpu] > [ 44.563700] vcn_v4_0_suspend+0x13/0x30 [amdgpu] > [ 44.563755] amdgpu_device_ip_suspend_phase2+0x240/0x470 [amdgpu] > [ 44.563806] amdgpu_device_ip_suspend+0x41/0x80 [amdgpu] > [ 44.563858] amdgpu_device_pre_asic_reset+0xd9/0x4a0 [amdgpu] > [ 44.563909] amdgpu_device_gpu_recover.cold+0x548/0xcf1 [amdgpu] > [ 44.564006] amdgpu_debugfs_reset_work+0x4c/0x80 [amdgpu] > [ 44.564061] process_one_work+0x21f/0x400 > [ 44.564062] worker_thread+0x200/0x3f0 > [ 44.564063] ? process_one_work+0x400/0x400 > [ 44.564064] kthread+0xee/0x120 > [ 44.564065] ? kthread_complete_and_exit+0x20/0x20 > [ 44.564066] ret_from_fork+0x22/0x30 > > Fixes: ea5309de7388 ("drm/amdgpu: add VCN 4.0 RAS poison consumption > handling") > Signed-off-by: Horatio Zhang <Hongkun.Zhang@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c > b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c > index bf0674039598..b55eb1bf3e30 100644 > --- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c > @@ -281,6 +281,21 @@ static int vcn_v4_0_hw_init(void *handle) > return r; > } > > +static int vcn_v4_0_late_init(void *handle) { > + struct amdgpu_device *adev = (struct amdgpu_device *)handle; > + int i; > + > + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { > + if (adev->vcn.harvest_config & (1 << i)) > + continue; > + > + amdgpu_irq_get(adev, &adev->vcn.inst[i].irq, 0); [Tao] we can also check its return value and exit if the r is none-zero. But either way is fine with me. > + } > + > + return 0; > +} > + > /** > * vcn_v4_0_hw_fini - stop the hardware block > * > @@ -2047,7 +2062,7 @@ static void vcn_v4_0_set_irq_funcs(struct > amdgpu_device *adev) static const struct amd_ip_funcs vcn_v4_0_ip_funcs = { > .name = "vcn_v4_0", > .early_init = vcn_v4_0_early_init, > - .late_init = NULL, > + .late_init = vcn_v4_0_late_init, > .sw_init = vcn_v4_0_sw_init, > .sw_fini = vcn_v4_0_sw_fini, > .hw_init = vcn_v4_0_hw_init, > -- > 2.34.1