RE: [PATCH 4/4] drm/amdgpu: avoid dump mca bank log muti times during ras ISR

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Official Use Only - General]

-----Original Message-----
From: Zhou1, Tao <Tao.Zhou1@xxxxxxx>
Sent: Thursday, April 25, 2024 4:31 PM
To: Wang, Yang(Kevin) <KevinYang.Wang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Li, Candice <Candice.Li@xxxxxxx>
Subject: RE: [PATCH 4/4] drm/amdgpu: avoid dump mca bank log muti times during ras ISR

[AMD Official Use Only - General]

> -----Original Message-----
> From: Wang, Yang(Kevin) <KevinYang.Wang@xxxxxxx>
> Sent: Tuesday, April 23, 2024 4:27 PM
> To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Zhou1, Tao
> <Tao.Zhou1@xxxxxxx>; Li, Candice <Candice.Li@xxxxxxx>
> Subject: [PATCH 4/4] drm/amdgpu: avoid dump mca bank log muti times
> during ras ISR
>
> because the ue valid mca count will only be cleared after gpu reset,
> so only dump mca log on the first time to get mca bank after receive RAS interrupt.
>
> Signed-off-by: Yang Wang <kevinyang.wang@xxxxxxx>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c | 28
> +++++++++++++++++++++++++  drivers/gpu/drm/amd/amdgpu/amdgpu_mca.h |
> 1 +
>  2 files changed, 29 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c
> index 264f56fd4f66..b581523fa8d7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c
> @@ -229,6 +229,8 @@ int amdgpu_mca_init(struct amdgpu_device *adev)
>       struct mca_bank_cache *mca_cache;
>       int i;
>
> +     atomic_set(&mca->ue_update_flag, 0);
> +
>       for (i = 0; i < ARRAY_SIZE(mca->mca_caches); i++) {
>               mca_cache = &mca->mca_caches[i];
>               mutex_init(&mca_cache->lock); @@ -244,6 +246,8 @@ void
> amdgpu_mca_fini(struct amdgpu_device *adev)
>       struct mca_bank_cache *mca_cache;
>       int i;
>
> +     atomic_set(&mca->ue_update_flag, 0);
> +
>       for (i = 0; i < ARRAY_SIZE(mca->mca_caches); i++) {
>               mca_cache = &mca->mca_caches[i];
>               amdgpu_mca_bank_set_release(&mca_cache->mca_set);
> @@ -325,6 +329,27 @@ static int amdgpu_mca_smu_get_mca_entry(struct
> amdgpu_device *adev, enum amdgpu_
>       return mca_funcs->mca_get_mca_entry(adev, type, idx, entry);  }
>
> +static bool amdgpu_mca_bank_should_update(struct amdgpu_device *adev,
> +enum amdgpu_mca_error_type type) {
> +     struct amdgpu_mca *mca = &adev->mca;
> +     bool ret = true;
> +
> +     /*
> +      * Because the UE Valid MCA count will only be cleared after reset,
> +      * in order to avoid repeated counting of the error count,
> +      * the aca bank is only updated once during the gpu recovery stage.
> +      */
> +     if (type == AMDGPU_MCA_ERROR_TYPE_UE) {
> +             if (amdgpu_ras_intr_triggered())
> +                     ret = atomic_cmpxchg(&mca->ue_update_flag, 0, 1)
> + ==
> 0;
> +             else
> +                     atomic_set(&mca->ue_update_flag, 0);
> +     }
> +
> +     return ret;
> +}
> +
> +

[Tao] redundant line, with this fixed, the patch is:

Reviewed-by: Tao Zhou <tao.zhou1@xxxxxxx>

[Kevin]:
Thanks for reminding me.

Best Regards,
Kevin

>  static int amdgpu_mca_smu_get_mca_set(struct amdgpu_device *adev,
> enum amdgpu_mca_error_type type, struct mca_bank_set *mca_set,
>                                     struct ras_query_context *qctx)  {
> @@ -
> 335,6 +360,9 @@ static int amdgpu_mca_smu_get_mca_set(struct
> amdgpu_device *adev, enum amdgpu_mc
>       if (!mca_set)
>               return -EINVAL;
>
> +     if (!amdgpu_mca_bank_should_update(adev, type))
> +             return 0;
> +
>       ret = amdgpu_mca_smu_get_valid_mca_count(adev, type, &count);
>       if (ret)
>               return ret;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mca.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mca.h
> index 9b97cfa28e05..e80323ff90c1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mca.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mca.h
> @@ -93,6 +93,7 @@ struct amdgpu_mca {
>       struct amdgpu_mca_ras mpio;
>       const struct amdgpu_mca_smu_funcs *mca_funcs;
>       struct mca_bank_cache mca_caches[AMDGPU_MCA_ERROR_TYPE_DE];
> +     atomic_t ue_update_flag;
>  };
>
>  enum mca_reg_idx {
> --
> 2.34.1






[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux