[AMD Official Use Only] Dear Paul, Comment inline. Regards, Zafar >-----Original Message----- >From: Paul Menzel <pmenzel@xxxxxxxxxxxxx> >Sent: Monday, March 28, 2022 3:08 PM >To: Ziya, Mohammad zafar <Mohammadzafar.Ziya@xxxxxxx>; Zhou1, Tao ><Tao.Zhou1@xxxxxxx> >Cc: Lazar, Lijo <Lijo.Lazar@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Zhang, >Hawking <Hawking.Zhang@xxxxxxx> >Subject: Re: [PATCH v4 5/6] drm/amdgpu/vcn: VCN ras error query support > > >Dear Mohammad, > > >Am 28.03.22 um 10:47 schrieb Ziya, Mohammad zafar: > >[…] > >>> -----Original Message----- >>> From: Paul Menzel <pmenzel@xxxxxxxxxxxxx> >>> Sent: Monday, March 28, 2022 1:39 PM > >>> Am 28.03.22 um 10:00 schrieb Ziya, Mohammad zafar: >>> >>> […] >>> >>>>> From: Paul Menzel <pmenzel@xxxxxxxxxxxxx> >>>>> Sent: Monday, March 28, 2022 1:22 PM > >>>>> Am 28.03.22 um 09:43 schrieb Zhou1, Tao: >>>>>> -----Original Message----- >>>>>> From: Ziya, Mohammad zafar <Mohammadzafar.Ziya@xxxxxxx> >>>>>> Sent: Monday, March 28, 2022 2:25 PM >>> >>> […] >>> >>>>>> +static uint32_t vcn_v2_6_query_poison_by_instance(struct >amdgpu_device *adev, >>>>>> + uint32_t instance, uint32_t sub_block) { >>>>>> + uint32_t poison_stat = 0, reg_value = 0; >>>>>> + >>>>>> + switch (sub_block) { >>>>>> + case AMDGPU_VCN_V2_6_VCPU_VCODEC: >>>>>> + reg_value = RREG32_SOC15(VCN, instance, >mmUVD_RAS_VCPU_VCODEC_STATUS); >>>>>> + poison_stat = REG_GET_FIELD(reg_value, >UVD_RAS_VCPU_VCODEC_STATUS, POISONED_PF); >>>>>> + break; >>>>>> + default: >>>>>> + break; >>>>>> + }; >>>>>> + >>>>>> + if (poison_stat) >>>>>> + dev_info(adev->dev, "Poison detected in VCN%d, >sub_block%d\n", >>>>>> + instance, sub_block); >>>>> >>>>> What should a user do with that information? Faulty hardware, …? >>>> >>>> [Mohammad]: This message will help to identify the faulty hardware, >>>> the hardware ID will also log along with poison, help to identify >>>> among multiple hardware installed on the system. >>> >>> Thank you for clarifying. If it’s indeed faulty hardware, should the >>> log level be increased to be an error? Keep in mind, that normal >>> ignorant users (like me) are reading the message, and it’d be great >>> to guide them a little. They do not know what “Poison“ means I guess. >Maybe: >>> >>> A hardware corruption was found indicating the device might be faulty. >>> (Poison detected in VCN%d, sub_block%d)\n >>> >>> (Keep in mind, I do not know anything about RAS.) >> >> [Mohammad]: It is an error condition, but this is just an information >> message which could have been ignored as well because VCN just >> consumed the poison, not created. > >Sorry, I have never seen these message in `dmesg`, so could you give an >example log please, what the user would see? > [Mohammad]: [ 231.181316] amdgpu 0000:8a:00.0: amdgpu: Poison detected in VCN0, sub_block0 Sample message from amdgpu " [ 237.013029] amdgpu 0000:8a:00.0: amdgpu: HDCP: optional hdcp ta ucode is not available " > >Kind regards, > >Paul