Re: [PATCH v4 5/6] drm/amdgpu/vcn: VCN ras error query support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Mohammad,


Am 28.03.22 um 11:49 schrieb Ziya, Mohammad zafar:

-----Original Message-----
From: Paul Menzel <pmenzel@xxxxxxxxxxxxx>
Sent: Monday, March 28, 2022 3:08 PM

Am 28.03.22 um 10:47 schrieb Ziya, Mohammad zafar:

[…]

-----Original Message-----
From: Paul Menzel <pmenzel@xxxxxxxxxxxxx>
Sent: Monday, March 28, 2022 1:39 PM

Am 28.03.22 um 10:00 schrieb Ziya, Mohammad zafar:

[…]

From: Paul Menzel <pmenzel@xxxxxxxxxxxxx>
Sent: Monday, March 28, 2022 1:22 PM

Am 28.03.22 um 09:43 schrieb Zhou1, Tao:
-----Original Message-----
From: Ziya, Mohammad zafar <Mohammadzafar.Ziya@xxxxxxx>
Sent: Monday, March 28, 2022 2:25 PM

[…]

+static uint32_t vcn_v2_6_query_poison_by_instance(struct amdgpu_device *adev,
+			uint32_t instance, uint32_t sub_block) {
+	uint32_t poison_stat = 0, reg_value = 0;
+
+	switch (sub_block) {
+	case AMDGPU_VCN_V2_6_VCPU_VCODEC:
+		reg_value = RREG32_SOC15(VCN, instance, mmUVD_RAS_VCPU_VCODEC_STATUS);
+		poison_stat = REG_GET_FIELD(reg_value, UVD_RAS_VCPU_VCODEC_STATUS, POISONED_PF);
+		break;
+	default:
+		break;
+	};
+
+	if (poison_stat)
+		dev_info(adev->dev, "Poison detected in VCN%d, sub_block%d\n",
+			instance, sub_block);

What should a user do with that information? Faulty hardware, …?

[Mohammad]: This message will help to identify the faulty hardware,
the hardware ID will also log along with poison, help to identify
among multiple hardware installed on the system.

Thank you for clarifying. If it’s indeed faulty hardware, should the
log level be increased to be an error? Keep in mind, that normal
ignorant users (like me) are reading the message, and it’d be great
to guide them a little. They do not know what “Poison“ means I guess. Maybe:

A hardware corruption was found indicating the device might be faulty.
(Poison detected in VCN%d, sub_block%d)\n

(Keep in mind, I do not know anything about RAS.)

[Mohammad]: It is an error condition, but this is just an information
message which could have been ignored as well because VCN just
consumed the poison, not created.

Sorry, I have never seen these message in `dmesg`, so could you give an
example log please, what the user would see?


[Mohammad]: [  231.181316] amdgpu 0000:8a:00.0: amdgpu: Poison detected in VCN0, sub_block0

Sample message from amdgpu " [  237.013029] amdgpu 0000:8a:00.0: amdgpu: HDCP: optional hdcp ta ucode is not available "

Hmm, that is six seconds later, so, if Linux logs other stuff in between, no idea if the connection will be made.

Both messages read like debug message, with normal users not having a clue what to do. Can that be improved by rewording them?


Kind regards,

Paul



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux