[AMD Official Use Only] Yeah, you are right, I ignored ras initialization failure case, will update soon, thanks. Regards, Stanley > -----邮件原件----- > 发件人: Zhang, Hawking <Hawking.Zhang@xxxxxxx> > 发送时间: Friday, November 26, 2021 9:11 PM > 收件人: Yang, Stanley <Stanley.Yang@xxxxxxx>; amd- > gfx@xxxxxxxxxxxxxxxxxxxxx; Clements, John <John.Clements@xxxxxxx>; > Zhou1, Tao <Tao.Zhou1@xxxxxxx>; Li, Candice <Candice.Li@xxxxxxx>; > Chai, Thomas <YiPeng.Chai@xxxxxxx> > 主题: RE: [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed > when unload drvier > > [AMD Official Use Only] > > I suspect it is still needed, especially when amdgpu_ras_fini is used to deal > with ras initialization failure in psp_ras_initialize. > > Regards, > Hawking > > -----Original Message----- > From: Yang, Stanley <Stanley.Yang@xxxxxxx> > Sent: Friday, November 26, 2021 21:08 > To: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; amd- > gfx@xxxxxxxxxxxxxxxxxxxxx; Clements, John <John.Clements@xxxxxxx>; > Zhou1, Tao <Tao.Zhou1@xxxxxxx>; Li, Candice <Candice.Li@xxxxxxx>; > Chai, Thomas <YiPeng.Chai@xxxxxxx> > Subject: 回复: [PATCH Review 1/1] drm/amdgpu: fix disable ras feature > failed when unload drvier > > [AMD Official Use Only] > > It's not necessary, because before hw fini, all ras features have been > disabled and con->features is set to zero. > > Regards, > Stanley > > -----邮件原件----- > > 发件人: Zhang, Hawking <Hawking.Zhang@xxxxxxx> > > 发送时间: Friday, November 26, 2021 8:57 PM > > 收件人: Yang, Stanley <Stanley.Yang@xxxxxxx>; amd- > > gfx@xxxxxxxxxxxxxxxxxxxxx; Clements, John <John.Clements@xxxxxxx>; > > Zhou1, Tao <Tao.Zhou1@xxxxxxx>; Li, Candice <Candice.Li@xxxxxxx>; > > Chai, Thomas <YiPeng.Chai@xxxxxxx> > > 抄送: Yang, Stanley <Stanley.Yang@xxxxxxx> > > 主题: RE: [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed > > when unload drvier > > > > [AMD Official Use Only] > > > > Good catch. We still need to release ras object in the end. Any reason > > the sequence was removed? > > > > @@ -2564,9 +2563,6 @@ int amdgpu_ras_fini(struct amdgpu_device *adev) > > > > WARN(con->features, "Feature mask is not cleared"); > > > > - if (con->features) > > - amdgpu_ras_disable_all_features(adev, 1); > > - > > cancel_delayed_work_sync(&con->ras_counte_delay_work); > > > > Regards, > > Hawking > > > > -----Original Message----- > > From: Stanley.Yang <Stanley.Yang@xxxxxxx> > > Sent: Friday, November 26, 2021 17:48 > > To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Zhang, Hawking > > <Hawking.Zhang@xxxxxxx>; Clements, John > <John.Clements@xxxxxxx>; > > Zhou1, Tao <Tao.Zhou1@xxxxxxx>; Li, Candice <Candice.Li@xxxxxxx>; > > Chai, Thomas <YiPeng.Chai@xxxxxxx> > > Cc: Yang, Stanley <Stanley.Yang@xxxxxxx> > > Subject: [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed > > when unload drvier > > > > Function amdgpu_device_fini_hw is called before amdgpu_device_fini_sw, > > so ras ta will unload before send ras disable command, ras dsiable > > operation must before hw fini. > > > > Signed-off-by: Stanley.Yang <Stanley.Yang@xxxxxxx> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++-- > > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 ---- > > 2 files changed, 3 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > index 73ec46140d68..d5e642e90010 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > @@ -2838,8 +2838,6 @@ static int amdgpu_device_ip_fini(struct > > amdgpu_device *adev) > > if (amdgpu_sriov_vf(adev) && adev->virt.ras_init_done) > > amdgpu_virt_release_ras_err_handler_data(adev); > > > > - amdgpu_ras_pre_fini(adev); > > - > > if (adev->gmc.xgmi.num_physical_nodes > 1) > > amdgpu_xgmi_remove_device(adev); > > > > @@ -3959,6 +3957,9 @@ void amdgpu_device_fini_hw(struct > amdgpu_device > > *adev) > > > > amdgpu_fbdev_fini(adev); > > > > + /* disable ras feature must before hw fini */ > > + amdgpu_ras_pre_fini(adev); > > + > > amdgpu_device_ip_fini_early(adev); > > > > amdgpu_irq_fini_hw(adev); > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > > index 39dfd4d59881..65102d2a0a98 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > > @@ -2484,7 +2484,6 @@ void amdgpu_ras_late_fini(struct amdgpu_device > > *adev, > > amdgpu_ras_sysfs_remove(adev, ras_block); > > if (ih_info->cb) > > amdgpu_ras_interrupt_remove_handler(adev, ih_info); > > - amdgpu_ras_feature_enable(adev, ras_block, 0); > > } > > > > /* do some init work after IP late init as dependence. > > @@ -2564,9 +2563,6 @@ int amdgpu_ras_fini(struct amdgpu_device *adev) > > > > WARN(con->features, "Feature mask is not cleared"); > > > > - if (con->features) > > - amdgpu_ras_disable_all_features(adev, 1); > > - > > cancel_delayed_work_sync(&con->ras_counte_delay_work); > > > > amdgpu_ras_set_context(adev, NULL); > > -- > > 2.17.1