Dear Mohammad,
Am 28.03.22 um 10:47 schrieb Ziya, Mohammad zafar:
[…]
-----Original Message-----
From: Paul Menzel <pmenzel@xxxxxxxxxxxxx>
Sent: Monday, March 28, 2022 1:39 PM
Am 28.03.22 um 10:00 schrieb Ziya, Mohammad zafar:
[…]
From: Paul Menzel <pmenzel@xxxxxxxxxxxxx>
Sent: Monday, March 28, 2022 1:22 PM
Am 28.03.22 um 09:43 schrieb Zhou1, Tao:
-----Original Message-----
From: Ziya, Mohammad zafar <Mohammadzafar.Ziya@xxxxxxx>
Sent: Monday, March 28, 2022 2:25 PM
[…]
+static uint32_t vcn_v2_6_query_poison_by_instance(struct amdgpu_device *adev,
+ uint32_t instance, uint32_t sub_block) {
+ uint32_t poison_stat = 0, reg_value = 0;
+
+ switch (sub_block) {
+ case AMDGPU_VCN_V2_6_VCPU_VCODEC:
+ reg_value = RREG32_SOC15(VCN, instance, mmUVD_RAS_VCPU_VCODEC_STATUS);
+ poison_stat = REG_GET_FIELD(reg_value, UVD_RAS_VCPU_VCODEC_STATUS, POISONED_PF);
+ break;
+ default:
+ break;
+ };
+
+ if (poison_stat)
+ dev_info(adev->dev, "Poison detected in VCN%d, sub_block%d\n",
+ instance, sub_block);
What should a user do with that information? Faulty hardware, …?
[Mohammad]: This message will help to identify the faulty hardware,
the hardware ID will also log along with poison, help to identify
among multiple hardware installed on the system.
Thank you for clarifying. If it’s indeed faulty hardware, should the log level be
increased to be an error? Keep in mind, that normal ignorant users (like me)
are reading the message, and it’d be great to guide them a little. They do not
know what “Poison“ means I guess. Maybe:
A hardware corruption was found indicating the device might be faulty.
(Poison detected in VCN%d, sub_block%d)\n
(Keep in mind, I do not know anything about RAS.)
[Mohammad]: It is an error condition, but this is just an information
message which could have been ignored as well because VCN just
consumed the poison, not created.
Sorry, I have never seen these message in `dmesg`, so could you give an
example log please, what the user would see?
Kind regards,
Paul