Re: [PATCH] drm/amd/pm: correct the checks for fan attributes support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 1/11/2022 8:02 AM, Quan, Evan wrote:
[AMD Official Use Only]



-----Original Message-----
From: Lazar, Lijo <Lijo.Lazar@xxxxxxx>
Sent: Monday, January 10, 2022 4:31 PM
To: Quan, Evan <Evan.Quan@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>
Subject: Re: [PATCH] drm/amd/pm: correct the checks for fan attributes
support



On 1/10/2022 1:25 PM, Quan, Evan wrote:
[AMD Official Use Only]



-----Original Message-----
From: Lazar, Lijo <Lijo.Lazar@xxxxxxx>
Sent: Monday, January 10, 2022 3:36 PM
To: Quan, Evan <Evan.Quan@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>
Subject: Re: [PATCH] drm/amd/pm: correct the checks for fan
attributes support



On 1/10/2022 11:30 AM, Evan Quan wrote:
Before we relied on the return values from the corresponding interfaces.
That is with low efficiency. And the wrong intermediate variable
used makes the fan mode stuck at manual mode which then causes
overheating
in
3D graphics tests.

Signed-off-by: Evan Quan <evan.quan@xxxxxxx>
Change-Id: Ia93ccf3b929c12e6d10b50c8f3596783ac63f0e3
---
    drivers/gpu/drm/amd/pm/amdgpu_dpm.c     | 23
+++++++++++++++++++++++
    drivers/gpu/drm/amd/pm/amdgpu_pm.c      | 20 ++++++++++----------
    drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h | 12 ++++++++++++
    3 files changed, 45 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
index 68d2e80a673b..e732418a9558 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
@@ -1547,3 +1547,26 @@ int
amdgpu_dpm_get_dpm_clock_table(struct
amdgpu_device *adev,

    	return ret;
    }
+
+int amdgpu_dpm_is_fan_operation_supported(struct amdgpu_device
*adev,
+					  enum fan_operation_id id)
+{
+	const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
+
+	switch (id) {
+	case FAN_CONTROL_MODE_RETRIEVING:
+		return pp_funcs->get_fan_control_mode ? 1 : 0;
+	case FAN_CONTROL_MODE_SETTING:
+		return pp_funcs->set_fan_control_mode ? 1 : 0;
+	case FAN_SPEED_PWM_RETRIEVING:
+		return pp_funcs->get_fan_speed_pwm ? 1 : 0;
+	case FAN_SPEED_PWM_SETTING:
+		return pp_funcs->set_fan_speed_pwm ? 1 : 0;
+	case FAN_SPEED_RPM_RETRIEVING:
+		return pp_funcs->get_fan_speed_rpm ? 1 : 0;
+	case FAN_SPEED_RPM_SETTING:
+		return pp_funcs->set_fan_speed_rpm ? 1 : 0;
+	default:
+		return 0;
+	}
+}
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index d3eab245e0fe..57721750c51a 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -3263,15 +3263,15 @@ static umode_t
hwmon_attributes_visible(struct kobject *kobj,
    		return 0;

    	/* mask fan attributes if we have no bindings for this asic to
expose
*/
-	if (((amdgpu_dpm_get_fan_speed_pwm(adev, &speed) == -EINVAL)
&&
+	if ((!amdgpu_dpm_is_fan_operation_supported(adev,
FAN_SPEED_PWM_RETRIEVING) &&

As per the current logic, it's really checking the hardware registers.
[Quan, Evan] I probably should mention the "current" version you see now
is actually a regression introduced by the commit below:
801771de0331 drm/amd/pm: do not expose power implementation details
to
amdgpu_pm.c

The very early version(which works good) is something like below:
-       if (!is_support_sw_smu(adev)) {
-               /* mask fan attributes if we have no bindings for this asic to expose
*/
-               if ((!adev->powerplay.pp_funcs->get_fan_speed_pwm &&
-                    attr == &sensor_dev_attr_pwm1.dev_attr.attr) || /* can't
query fan */
-                   (!adev->powerplay.pp_funcs->get_fan_control_mode &&
-                    attr == &sensor_dev_attr_pwm1_enable.dev_attr.attr)) /* can't
query state */
-                       effective_mode &= ~S_IRUGO;

So, the changes here are really just back to old working version. It aims to
provide a quick fix for the failures reported by CQE.

I see. Could you model on it based on below one? This is preferrable rather
than introducing new API.

drm/amdgpu/pm: Don't show pp_power_profile_mode for unsupported
devices.
[Quan, Evan] In fact, those piece of code from the mentioned change was updated as below
         } else if (DEVICE_ATTR_IS(pp_power_profile_mode)) {
                 if (amdgpu_dpm_get_power_profile_mode(adev, NULL) == -EOPNOTSUPP)
                         *states = ATTR_STATE_UNSUPPORTED;
         }
As the access for adev->powerplay.pp_funcs from amdgpu_pm.c was forbidden after the pm cleanups.
So, we have to rely on some (new)API in amdgpu_dpm.c to do those checks.


To be clear, the model is to use a dummy call to check if the API is implemented -
	
	amdgpu_dpm_get_fan_speed_rpm(adev, NULL) == -EOPNOTSUPP
	amdgpu_dpm_set_fan_speed_rpm(adev, -1) == -EOPNOTSUPP

That is better instead of adding another API and flags for each set/get API.

Thanks,
Lijo

A more proper way to cleanup all those attributes support checks stuff is to have a flag like "adev->pm.sysfs_attribtues_flags".
It labels all those sysfs attributes supported on each ASIC. However, considering the ASICs involved and the difference between them, that may be not an easy job.

BR
Evan

Thanks,
Lijo

For ex: we could have some SKUs that have PMFW based fan control and
for some other SKUs, AIBs could be having a different cooling
solution which doesn't make use of PMFW.


    	      attr == &sensor_dev_attr_pwm1.dev_attr.attr) || /* can't
query
fan */
-	    ((amdgpu_dpm_get_fan_control_mode(adev, &speed) == -
EOPNOTSUPP) &&
+	    (!amdgpu_dpm_is_fan_operation_supported(adev,
FAN_CONTROL_MODE_RETRIEVING) &&
    	     attr == &sensor_dev_attr_pwm1_enable.dev_attr.attr)) /*
can't
query state */
    		effective_mode &= ~S_IRUGO;

-	if (((amdgpu_dpm_set_fan_speed_pwm(adev, speed) == -EINVAL)
&&
+	if ((!amdgpu_dpm_is_fan_operation_supported(adev,
FAN_SPEED_PWM_SETTING) &&
    	      attr == &sensor_dev_attr_pwm1.dev_attr.attr) || /* can't
manage fan */
-	      ((amdgpu_dpm_set_fan_control_mode(adev, speed) == -
EOPNOTSUPP) &&
+	    (!amdgpu_dpm_is_fan_operation_supported(adev,
FAN_CONTROL_MODE_SETTING) &&
    	      attr == &sensor_dev_attr_pwm1_enable.dev_attr.attr)) /*
can't
manage state */
    		effective_mode &= ~S_IWUSR;

@@ -3291,16 +3291,16 @@ static umode_t
hwmon_attributes_visible(struct kobject *kobj,
    		return 0;

    	/* hide max/min values if we can't both query and manage the fan */
-	if (((amdgpu_dpm_set_fan_speed_pwm(adev, speed) == -EINVAL)
&&
-	      (amdgpu_dpm_get_fan_speed_pwm(adev, &speed) == -EINVAL)
&&
-	      (amdgpu_dpm_set_fan_speed_rpm(adev, speed) == -EINVAL)
&&
-	      (amdgpu_dpm_get_fan_speed_rpm(adev, &speed) == -EINVAL))
&&
+	if ((!amdgpu_dpm_is_fan_operation_supported(adev,
FAN_SPEED_PWM_SETTING) &&
+	     !amdgpu_dpm_is_fan_operation_supported(adev,
FAN_SPEED_PWM_RETRIEVING) &&
+	     !amdgpu_dpm_is_fan_operation_supported(adev,
FAN_SPEED_RPM_SETTING) &&
+	     !amdgpu_dpm_is_fan_operation_supported(adev,
FAN_SPEED_RPM_RETRIEVING)) &&

If this is the case, I think we should set pm.no_fan since nothing is
possible.
[Quan, Evan] Yep, I agree a more optimized version should be something
like that.
Let's take this a quick solution and do further optimizations later.

BR
Evan

Thanks,
Lijo

    	    (attr == &sensor_dev_attr_pwm1_max.dev_attr.attr ||
    	     attr == &sensor_dev_attr_pwm1_min.dev_attr.attr))
    		return 0;

-	if ((amdgpu_dpm_set_fan_speed_rpm(adev, speed) == -EINVAL)
&&
-	     (amdgpu_dpm_get_fan_speed_rpm(adev, &speed) == -EINVAL)
&&
+	if ((!amdgpu_dpm_is_fan_operation_supported(adev,
FAN_SPEED_RPM_SETTING) &&
+	     !amdgpu_dpm_is_fan_operation_supported(adev,
FAN_SPEED_RPM_RETRIEVING)) &&
    	     (attr == &sensor_dev_attr_fan1_max.dev_attr.attr ||
    	     attr == &sensor_dev_attr_fan1_min.dev_attr.attr))
    		return 0;
diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
index ba857ca75392..9e18151a3c46 100644
--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
+++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
@@ -330,6 +330,16 @@ struct amdgpu_pm {
    	bool			pp_force_state_enabled;
    };

+enum fan_operation_id
+{
+	FAN_CONTROL_MODE_RETRIEVING = 0,
+	FAN_CONTROL_MODE_SETTING    = 1,
+	FAN_SPEED_PWM_RETRIEVING    = 2,
+	FAN_SPEED_PWM_SETTING       = 3,
+	FAN_SPEED_RPM_RETRIEVING    = 4,
+	FAN_SPEED_RPM_SETTING       = 5,
+};
+
    u32 amdgpu_dpm_get_vblank_time(struct amdgpu_device *adev);
    int amdgpu_dpm_read_sensor(struct amdgpu_device *adev, enum
amd_pp_sensors sensor,
    			   void *data, uint32_t *size); @@ -510,4 +520,6 @@
enum
pp_smu_status
amdgpu_dpm_get_uclk_dpm_states(struct amdgpu_device *adev,
    						  unsigned int *num_states);
    int amdgpu_dpm_get_dpm_clock_table(struct amdgpu_device *adev,
    				   struct dpm_clocks *clock_table);
+int amdgpu_dpm_is_fan_operation_supported(struct amdgpu_device
*adev,
+					  enum fan_operation_id id);
    #endif




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux