[AMD Official Use Only - General] -----Original Message----- From: Zhou1, Tao <Tao.Zhou1@xxxxxxx> Sent: Thursday, April 25, 2024 4:31 PM To: Wang, Yang(Kevin) <KevinYang.Wang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Li, Candice <Candice.Li@xxxxxxx> Subject: RE: [PATCH 4/4] drm/amdgpu: avoid dump mca bank log muti times during ras ISR [AMD Official Use Only - General] > -----Original Message----- > From: Wang, Yang(Kevin) <KevinYang.Wang@xxxxxxx> > Sent: Tuesday, April 23, 2024 4:27 PM > To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Zhou1, Tao > <Tao.Zhou1@xxxxxxx>; Li, Candice <Candice.Li@xxxxxxx> > Subject: [PATCH 4/4] drm/amdgpu: avoid dump mca bank log muti times > during ras ISR > > because the ue valid mca count will only be cleared after gpu reset, > so only dump mca log on the first time to get mca bank after receive RAS interrupt. > > Signed-off-by: Yang Wang <kevinyang.wang@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c | 28 > +++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_mca.h | > 1 + > 2 files changed, 29 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c > index 264f56fd4f66..b581523fa8d7 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c > @@ -229,6 +229,8 @@ int amdgpu_mca_init(struct amdgpu_device *adev) > struct mca_bank_cache *mca_cache; > int i; > > + atomic_set(&mca->ue_update_flag, 0); > + > for (i = 0; i < ARRAY_SIZE(mca->mca_caches); i++) { > mca_cache = &mca->mca_caches[i]; > mutex_init(&mca_cache->lock); @@ -244,6 +246,8 @@ void > amdgpu_mca_fini(struct amdgpu_device *adev) > struct mca_bank_cache *mca_cache; > int i; > > + atomic_set(&mca->ue_update_flag, 0); > + > for (i = 0; i < ARRAY_SIZE(mca->mca_caches); i++) { > mca_cache = &mca->mca_caches[i]; > amdgpu_mca_bank_set_release(&mca_cache->mca_set); > @@ -325,6 +329,27 @@ static int amdgpu_mca_smu_get_mca_entry(struct > amdgpu_device *adev, enum amdgpu_ > return mca_funcs->mca_get_mca_entry(adev, type, idx, entry); } > > +static bool amdgpu_mca_bank_should_update(struct amdgpu_device *adev, > +enum amdgpu_mca_error_type type) { > + struct amdgpu_mca *mca = &adev->mca; > + bool ret = true; > + > + /* > + * Because the UE Valid MCA count will only be cleared after reset, > + * in order to avoid repeated counting of the error count, > + * the aca bank is only updated once during the gpu recovery stage. > + */ > + if (type == AMDGPU_MCA_ERROR_TYPE_UE) { > + if (amdgpu_ras_intr_triggered()) > + ret = atomic_cmpxchg(&mca->ue_update_flag, 0, 1) > + == > 0; > + else > + atomic_set(&mca->ue_update_flag, 0); > + } > + > + return ret; > +} > + > + [Tao] redundant line, with this fixed, the patch is: Reviewed-by: Tao Zhou <tao.zhou1@xxxxxxx> [Kevin]: Thanks for reminding me. Best Regards, Kevin > static int amdgpu_mca_smu_get_mca_set(struct amdgpu_device *adev, > enum amdgpu_mca_error_type type, struct mca_bank_set *mca_set, > struct ras_query_context *qctx) { > @@ - > 335,6 +360,9 @@ static int amdgpu_mca_smu_get_mca_set(struct > amdgpu_device *adev, enum amdgpu_mc > if (!mca_set) > return -EINVAL; > > + if (!amdgpu_mca_bank_should_update(adev, type)) > + return 0; > + > ret = amdgpu_mca_smu_get_valid_mca_count(adev, type, &count); > if (ret) > return ret; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mca.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_mca.h > index 9b97cfa28e05..e80323ff90c1 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mca.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mca.h > @@ -93,6 +93,7 @@ struct amdgpu_mca { > struct amdgpu_mca_ras mpio; > const struct amdgpu_mca_smu_funcs *mca_funcs; > struct mca_bank_cache mca_caches[AMDGPU_MCA_ERROR_TYPE_DE]; > + atomic_t ue_update_flag; > }; > > enum mca_reg_idx { > -- > 2.34.1