> -----Original Message----- > From: Chen, Guchun <Guchun.Chen@xxxxxxx> > Sent: 2019年8月1日 16:22 > To: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Zhou1, Tao > <Tao.Zhou1@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Li, Dennis > <Dennis.Li@xxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx> > Cc: Zhou1, Tao <Tao.Zhou1@xxxxxxx> > Subject: RE: [PATCH 0/4] enable umc ras ce interrupt > > 1) Patch 1, looks the return value of our callback always returns UE case, but I > assume CE case should also be covered. Maybe it's another topic. > if (ret == AMDGPU_RAS_UE) { > + /* these counts could be left as 0 if > + * some blocks do not count error number > + */ > obj->err_data.ue_count += err_data.ue_count; > + obj->err_data.ce_count += err_data.ce_count; > [Tao] Yes, it's a new topic. CE can also trigger interrupt, and even both ce and ue error can be found in one ras query. I think AMDGPU_RAS_SUCCESS is more suitable here, I'll provide a new patch to fix it. > 2) In Patch 2, one unused variable "ras_error_status" is there, do we need to > remove it? > > static void umc_v6_1_ras_init(struct amdgpu_device *adev) { > + void *ras_error_status = NULL; > > + amdgpu_umc_for_each_channel(umc_v6_1_ras_init_per_channel); > } [Tao] It's on purpose. amdgpu_umc_for_each_channel macro is a common definition for all umc channel functions, it will transfer ras_error_status to channel function. > > Regards, > Guchun > > -----Original Message----- > From: Zhang, Hawking <Hawking.Zhang@xxxxxxx> > Sent: Thursday, August 1, 2019 3:52 PM > To: Zhou1, Tao <Tao.Zhou1@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Li, > Dennis <Dennis.Li@xxxxxxx>; Chen, Guchun <Guchun.Chen@xxxxxxx>; > Pan, Xinhui <Xinhui.Pan@xxxxxxx> > Cc: Zhou1, Tao <Tao.Zhou1@xxxxxxx> > Subject: RE: [PATCH 0/4] enable umc ras ce interrupt > > 1.) Please fix the typo in patch #2 description: ec --> ce 2). Patch #2 > > + ecc_err_cnt_sel = REG_SET_FIELD(ecc_err_cnt_sel, > UMCCH0_0_EccErrCntSel, > + EccErrInt, 0x1); > For the EccErrInt field, it should be programed to be (MAX - INIT), correct? > but the hardcoded value seems not match with the value calculated by those > macro. > > Regards, > Hawking > -----Original Message----- > From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Tao > Zhou > Sent: 2019年8月1日 14:54 > To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Zhang, Hawking > <Hawking.Zhang@xxxxxxx>; Li, Dennis <Dennis.Li@xxxxxxx>; Chen, > Guchun <Guchun.Chen@xxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx> > Cc: Zhou1, Tao <Tao.Zhou1@xxxxxxx> > Subject: [PATCH 0/4] enable umc ras ce interrupt > > These patches add support for umc ce interrupt, the interrupt is controlled > by a error count threshold. > > Tao Zhou (4): > drm/amdgpu: support ce interrupt in ras module > drm/amdgpu: implement umc ras init function > drm/amdgpu: update the calc algorithm of umc ecc error count > drm/amdgpu: only uncorrectable error needs gpu reset > > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 12 ++++--- > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 6 +++- > drivers/gpu/drm/amd/amdgpu/umc_v6_1.c | 42 > ++++++++++++++++++++++--- > drivers/gpu/drm/amd/amdgpu/umc_v6_1.h | 7 +++++ > 4 files changed, 58 insertions(+), 9 deletions(-) > > -- > 2.17.1 > > _______________________________________________ > amd-gfx mailing list > amd-gfx@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx