Re: [PATCH v4 5/6] drm/amdgpu/vcn: VCN ras error query support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Mohammad,


Am 28.03.22 um 10:00 schrieb Ziya, Mohammad zafar:

[…]

From: Paul Menzel <pmenzel@xxxxxxxxxxxxx>
Sent: Monday, March 28, 2022 1:22 PM

[…]

Tao, it’s hard to find your reply in the quote, as the message is not quoted
correctly (> prepended). Is it possible to use a different email client?


Am 28.03.22 um 09:43 schrieb Zhou1, Tao:
-----Original Message-----
From: Ziya, Mohammad zafar <Mohammadzafar.Ziya@xxxxxxx>
Sent: Monday, March 28, 2022 2:25 PM

[…]

+static uint32_t vcn_v2_6_query_poison_by_instance(struct amdgpu_device *adev,
+			uint32_t instance, uint32_t sub_block) {
+	uint32_t poison_stat = 0, reg_value = 0;
+
+	switch (sub_block) {
+	case AMDGPU_VCN_V2_6_VCPU_VCODEC:
+		reg_value = RREG32_SOC15(VCN, instance, mmUVD_RAS_VCPU_VCODEC_STATUS);
+		poison_stat = REG_GET_FIELD(reg_value, UVD_RAS_VCPU_VCODEC_STATUS, POISONED_PF);
+		break;
+	default:
+		break;
+	};
+
+	if (poison_stat)
+		dev_info(adev->dev, "Poison detected in VCN%d, sub_block%d\n",
+			instance, sub_block);

What should a user do with that information? Faulty hardware, …?

[Mohammad]: This message will help to identify the faulty hardware,
the hardware ID will also log along with poison, help to identify
among multiple hardware installed on the system.

Thank you for clarifying. If it’s indeed faulty hardware, should the log level be increased to be an error? Keep in mind, that normal ignorant users (like me) are reading the message, and it’d be great to guide them a little. They do not know what “Poison“ means I guess. Maybe:

A hardware corruption was found indicating the device might be faulty. (Poison detected in VCN%d, sub_block%d)\n

(Keep in mind, I do not know anything about RAS.)

+
+	return poison_stat;
+}
+
+static bool vcn_v2_6_query_poison_status(struct amdgpu_device *adev) {
+	uint32_t inst, sub;
+	uint32_t poison_stat = 0;
+
+	for (inst = 0; inst < adev->vcn.num_vcn_inst; inst++)
+		for (sub = 0; sub < AMDGPU_VCN_V2_6_MAX_SUB_BLOCK; sub++)
+			poison_stat +=
+			vcn_v2_6_query_poison_by_instance(adev, inst, sub);
+
+	return poison_stat ? true : false;

[Tao] just want to confirm the logic, if the POISONED_PF of one instance is 1
and another is 0, the poison_stat is true, correct?
Can we add a print message for this case? Same question to JPEG.

[Mohammad]: Message will be printed on function block ahead of the function.

Is the `dev_info` message in `vcn_v2_6_query_poison_by_instance()` doing
what you want?

Also, instead of `poison_stat ? true : false;` a common pattern is
`!!poison_stat` I believe.

[…]

[Mohammad]: Noted the change. Will make to return !!poison_stat ? true : false;

No, just

    return !!poison_stat;

[…]


Kind regards,

Paul



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux