On Tue, Nov 12, 2024 at 10:24 PM Lazar, Lijo <lijo.lazar@xxxxxxx> wrote: > > > > On 11/13/2024 7:58 AM, Zhang, Jesse(Jie) wrote: > > [AMD Official Use Only - AMD Internal Distribution Only] > > > > Hi, Lijo, > > > > -----Original Message----- > > From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> > > Sent: Tuesday, November 12, 2024 10:54 PM > > To: Zhang, Jesse(Jie) <Jesse.Zhang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > > Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; Prosyak, Vitaly <Vitaly.Prosyak@xxxxxxx>; Huang, Tim <Tim.Huang@xxxxxxx> > > Subject: Re: [PATCH 2/2] drm/amdgpu: fix vcn sw init failed > > > > > > > > On 11/12/2024 8:00 PM, Jesse.zhang@xxxxxxx wrote: > >> [ 2875.870277] [drm:amdgpu_device_init [amdgpu]] *ERROR* sw_init of IP > >> block <vcn_v4_0_3> failed -22 [ 2875.880494] amdgpu 0000:01:00.0: > >> amdgpu: amdgpu_device_ip_init failed [ 2875.887689] amdgpu > >> 0000:01:00.0: amdgpu: Fatal error during GPU init [ 2875.894791] amdgpu 0000:01:00.0: amdgpu: amdgpu: finishing device. > >> > >> Add irqs with different IRQ source pointer for vcn0 and vcn1. > >> > >> Signed-off-by: Jesse Zhang <jesse.zhang@xxxxxxx> > >> --- > >> drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c | 19 +++++++++++++------ > >> 1 file changed, 13 insertions(+), 6 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c > >> b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c > >> index ef3dfd44a022..82b90f1e6f33 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c > >> @@ -83,6 +83,10 @@ static const struct amdgpu_hwip_reg_entry > >> vcn_reg_list_4_0_3[] = { > >> > >> #define NORMALIZE_VCN_REG_OFFSET(offset) \ > >> (offset & 0x1FFFF) > >> +static int amdgpu_ih_clientid_vcns[] = { > >> + SOC15_IH_CLIENTID_VCN, > >> + SOC15_IH_CLIENTID_VCN1 > > > > This is not valid for 4.0.3. It uses only the same client id, different node_id to distinguish. Also, there are max of 4 instances. > > > > I would say that entire IP instance series was done in a haste without applying thought and breaks other things including ip block mask. > > > > If the same client id is used, it returns -EINVAL (because of the following check) and sw init fails at step vcn_v4_0_3_sw_init / amdgpu_irq_add_id. > > > > amdgpu_irq_add_id: > > if (adev->irq.client[client_id].sources[src_id] != NULL) > > return -EINVAL; > > > > We had some side discussions on IP block-per-instance approach. > Personally, I was not in favour of it as I thought allowing IP block to > handle its own instances is the better approach and that could handle > dependencies between instances. Turns out that there are more like > handling common things for all instances as in this example. We tried that route as well before this one and it was ugly as well. > > I would prefer to revert the patch series and consider all angles before > moving forward on the approach. Will leave this to Alex/Christian/Leo on > the final decsion. Do the attached patches fix it? Alex > > Thanks, > Lijo > > > Regards > > Jesse > > > > > > Thanks, > > Lijo > > > >> +}; > >> > >> static int vcn_v4_0_3_start_sriov(struct amdgpu_device *adev); > >> static void vcn_v4_0_3_set_unified_ring_funcs(struct amdgpu_device > >> *adev, int inst); @@ -150,9 +154,9 @@ static int vcn_v4_0_3_sw_init(struct amdgpu_ip_block *ip_block) > >> if (r) > >> return r; > >> > >> - /* VCN DEC TRAP */ > >> - r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_VCN, > >> - VCN_4_0__SRCID__UVD_ENC_GENERAL_PURPOSE, &adev->vcn.inst->irq); > >> + /* VCN UNIFIED TRAP */ > >> + r = amdgpu_irq_add_id(adev, amdgpu_ih_clientid_vcns[inst], > >> + VCN_4_0__SRCID__UVD_ENC_GENERAL_PURPOSE, > >> +&adev->vcn.inst[inst].irq); > >> if (r) > >> return r; > >> > >> @@ -174,7 +178,7 @@ static int vcn_v4_0_3_sw_init(struct > >> amdgpu_ip_block *ip_block) > >> > >> ring->vm_hub = AMDGPU_MMHUB0(adev->vcn.inst[inst].aid_id); > >> sprintf(ring->name, "vcn_unified_%d", adev->vcn.inst[inst].aid_id); > >> - r = amdgpu_ring_init(adev, ring, 512, &adev->vcn.inst->irq, 0, > >> + r = amdgpu_ring_init(adev, ring, 512, &adev->vcn.inst[inst].irq, 0, > >> AMDGPU_RING_PRIO_DEFAULT, > >> &adev->vcn.inst[inst].sched_score); > >> if (r) > >> @@ -1734,9 +1738,12 @@ static const struct amdgpu_irq_src_funcs vcn_v4_0_3_irq_funcs = { > >> */ > >> static void vcn_v4_0_3_set_irq_funcs(struct amdgpu_device *adev, int > >> inst) { > >> - adev->vcn.inst->irq.num_types++; > >> + if (adev->vcn.harvest_config & (1 << inst)) > >> + return; > >> + > >> + adev->vcn.inst[inst].irq.num_types = adev->vcn.num_enc_rings + 1; > >> > >> - adev->vcn.inst->irq.funcs = &vcn_v4_0_3_irq_funcs; > >> + adev->vcn.inst[inst].irq.funcs = &vcn_v4_0_3_irq_funcs; > >> } > >> > >> static void vcn_v4_0_3_print_ip_state(struct amdgpu_ip_block > >> *ip_block, struct drm_printer *p)
From 939a93c835c130720a346dfb6baad297f1f26e25 Mon Sep 17 00:00:00 2001 From: Alex Deucher <alexander.deucher@xxxxxxx> Date: Tue, 12 Nov 2024 23:26:52 -0500 Subject: [PATCH 2/2] drm/amdgpu: use a single set of interrupt funcs for vcn 4.0.3 There a single client and source id. The node id from the IV ring is used to determine which instance the interrupt belongs to. Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> --- drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c index 88cbf1a88a07..8534f5370094 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c @@ -153,7 +153,7 @@ static int vcn_v4_0_3_sw_init(struct amdgpu_ip_block *ip_block) /* VCN DEC TRAP */ r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_VCN, VCN_4_0__SRCID__UVD_ENC_GENERAL_PURPOSE, &adev->vcn.inst->irq); - if (r) + if (r && r != -EEXIST) return r; volatile struct amdgpu_vcn4_fw_shared *fw_shared; @@ -1741,7 +1741,7 @@ static const struct amdgpu_irq_src_funcs vcn_v4_0_3_irq_funcs = { */ static void vcn_v4_0_3_set_irq_funcs(struct amdgpu_device *adev, int inst) { - adev->vcn.inst->irq.num_types++; + adev->vcn.inst->irq.num_types = 1; adev->vcn.inst->irq.funcs = &vcn_v4_0_3_irq_funcs; } -- 2.47.0
From c9814f2b031a8a08c53a67e91e25f4e06b3e0b19 Mon Sep 17 00:00:00 2001 From: Alex Deucher <alexander.deucher@xxxxxxx> Date: Tue, 12 Nov 2024 22:51:01 -0500 Subject: [PATCH 1/2] drm/amdgpu: return -EEXIST if an interrupt handler exists So we can tell is one is already registered. Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> --- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index 19ce4da285e8..2f66b8bba3c7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -408,7 +408,7 @@ int amdgpu_irq_add_id(struct amdgpu_device *adev, } if (adev->irq.client[client_id].sources[src_id] != NULL) - return -EINVAL; + return -EEXIST; if (source->num_types && !source->enabled_types) { atomic_t *types; -- 2.47.0