[AMD Official Use Only] Dear Paul, Thank you for review. Comment added inline. Regards, Zafar >-----Original Message----- >From: Paul Menzel <pmenzel@xxxxxxxxxxxxx> >Sent: Monday, March 28, 2022 1:39 PM >To: Ziya, Mohammad zafar <Mohammadzafar.Ziya@xxxxxxx>; Zhou1, Tao ><Tao.Zhou1@xxxxxxx> >Cc: Lazar, Lijo <Lijo.Lazar@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Zhang, >Hawking <Hawking.Zhang@xxxxxxx> >Subject: Re: [PATCH v4 5/6] drm/amdgpu/vcn: VCN ras error query support > >Dear Mohammad, > > >Am 28.03.22 um 10:00 schrieb Ziya, Mohammad zafar: > >[…] > >>> From: Paul Menzel <pmenzel@xxxxxxxxxxxxx> >>> Sent: Monday, March 28, 2022 1:22 PM > >[…] > >>> Tao, it’s hard to find your reply in the quote, as the message is not >>> quoted correctly (> prepended). Is it possible to use a different email >client? >>> >>> >>> Am 28.03.22 um 09:43 schrieb Zhou1, Tao: >>>> -----Original Message----- >>>> From: Ziya, Mohammad zafar <Mohammadzafar.Ziya@xxxxxxx> >>>> Sent: Monday, March 28, 2022 2:25 PM > >[…] > >>>> +static uint32_t vcn_v2_6_query_poison_by_instance(struct >amdgpu_device *adev, >>>> + uint32_t instance, uint32_t sub_block) { >>>> + uint32_t poison_stat = 0, reg_value = 0; >>>> + >>>> + switch (sub_block) { >>>> + case AMDGPU_VCN_V2_6_VCPU_VCODEC: >>>> + reg_value = RREG32_SOC15(VCN, instance, >mmUVD_RAS_VCPU_VCODEC_STATUS); >>>> + poison_stat = REG_GET_FIELD(reg_value, >UVD_RAS_VCPU_VCODEC_STATUS, POISONED_PF); >>>> + break; >>>> + default: >>>> + break; >>>> + }; >>>> + >>>> + if (poison_stat) >>>> + dev_info(adev->dev, "Poison detected in VCN%d, >sub_block%d\n", >>>> + instance, sub_block); >>> >>> What should a user do with that information? Faulty hardware, …? >> >> [Mohammad]: This message will help to identify the faulty hardware, >> the hardware ID will also log along with poison, help to identify >> among multiple hardware installed on the system. > >Thank you for clarifying. If it’s indeed faulty hardware, should the log level be >increased to be an error? Keep in mind, that normal ignorant users (like me) >are reading the message, and it’d be great to guide them a little. They do not >know what “Poison“ means I guess. Maybe: > >A hardware corruption was found indicating the device might be faulty. >(Poison detected in VCN%d, sub_block%d)\n > >(Keep in mind, I do not know anything about RAS.) [Mohammad]: It is an error condition, but this is just an information message which could have been ignored as well because VCN just consumed the poison, not created. > >>>> + >>>> + return poison_stat; >>>> +} >>>> + >>>> +static bool vcn_v2_6_query_poison_status(struct amdgpu_device >*adev) { >>>> + uint32_t inst, sub; >>>> + uint32_t poison_stat = 0; >>>> + >>>> + for (inst = 0; inst < adev->vcn.num_vcn_inst; inst++) >>>> + for (sub = 0; sub < AMDGPU_VCN_V2_6_MAX_SUB_BLOCK; >sub++) >>>> + poison_stat += >>>> + vcn_v2_6_query_poison_by_instance(adev, inst, >sub); >>>> + >>>> + return poison_stat ? true : false; >>>> >>>> [Tao] just want to confirm the logic, if the POISONED_PF of one >>>> instance is 1 >>> and another is 0, the poison_stat is true, correct? >>>> Can we add a print message for this case? Same question to JPEG. >> >> [Mohammad]: Message will be printed on function block ahead of the >function. >> >>> Is the `dev_info` message in `vcn_v2_6_query_poison_by_instance()` >>> doing what you want? >>> >>> Also, instead of `poison_stat ? true : false;` a common pattern is >>> `!!poison_stat` I believe. > >[…] > >> [Mohammad]: Noted the change. Will make to return !!poison_stat ? true >> : false; > >No, just > > return !!poison_stat; [Mohammad]: Noted. I realized !!poison_stat is enough after sending the reply. > >[…] > > >Kind regards, > >Paul