On Wed, Jan 12, 2022 at 3:36 AM Zhou1, Tao <Tao.Zhou1@xxxxxxx> wrote: > > [AMD Official Use Only] > > > > > -----Original Message----- > > From: Chai, Thomas <YiPeng.Chai@xxxxxxx> > > Sent: Wednesday, January 12, 2022 3:48 PM > > To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx > > Cc: Chai, Thomas <YiPeng.Chai@xxxxxxx>; Zhang, Hawking > > <Hawking.Zhang@xxxxxxx>; Zhou1, Tao <Tao.Zhou1@xxxxxxx>; Clements, > > John <John.Clements@xxxxxxx>; Chai, Thomas <YiPeng.Chai@xxxxxxx> > > Subject: [PATCH 2/2] drm/amdgpu: No longer insert ras blocks into ras_list if it > > already exists in ras_list > > > > No longer insert ras blocks into ras_list if it already exists in ras_list. > > > > Signed-off-by: yipechai <YiPeng.Chai@xxxxxxx> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > > index 62be0b4909b3..e6d3bb4b56e4 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > > @@ -2754,9 +2754,17 @@ int amdgpu_ras_reset_gpu(struct amdgpu_device > > *adev) int amdgpu_ras_register_ras_block(struct amdgpu_device *adev, > > struct amdgpu_ras_block_object* ras_block_obj) { > > + struct amdgpu_ras_block_object *obj, *tmp; > > if (!adev || !amdgpu_ras_asic_supported(adev) || !ras_block_obj) > > return -EINVAL; > > > > + /* If the ras object had been in ras_list, doesn't add it to ras_list again */ > [Tao] How about "If the ras object is in ras_list, don't add it again" > > > + list_for_each_entry_safe(obj, tmp, &adev->ras_list, node) { > > + if (obj == ras_block_obj) { > > + return 0; > > + } > > + } > > [Tao] The patch is OK for me currently, but I think the root cause is we initialize adev->gmc.xgmi.ras in gmc_ras_late_init, the initialization should be called only in modprobe stage and we can create a general gmc_early_init for it. Yes, please fix the root cause. We should only be adding the blocks once. This is just papering over the actual problem. Alex > > > + > > INIT_LIST_HEAD(&ras_block_obj->node); > > list_add_tail(&ras_block_obj->node, &adev->ras_list); > > > > -- > > 2.25.1