On 1/11/2022 8:02 AM, Quan, Evan wrote:
[AMD Official Use Only]-----Original Message----- From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> Sent: Monday, January 10, 2022 4:31 PM To: Quan, Evan <Evan.Quan@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx> Subject: Re: [PATCH] drm/amd/pm: correct the checks for fan attributes support On 1/10/2022 1:25 PM, Quan, Evan wrote:[AMD Official Use Only]-----Original Message----- From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> Sent: Monday, January 10, 2022 3:36 PM To: Quan, Evan <Evan.Quan@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx> Subject: Re: [PATCH] drm/amd/pm: correct the checks for fan attributes support On 1/10/2022 11:30 AM, Evan Quan wrote:Before we relied on the return values from the corresponding interfaces. That is with low efficiency. And the wrong intermediate variable used makes the fan mode stuck at manual mode which then causes overheatingin3D graphics tests. Signed-off-by: Evan Quan <evan.quan@xxxxxxx> Change-Id: Ia93ccf3b929c12e6d10b50c8f3596783ac63f0e3 --- drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 23+++++++++++++++++++++++drivers/gpu/drm/amd/pm/amdgpu_pm.c | 20 ++++++++++---------- drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h | 12 ++++++++++++ 3 files changed, 45 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.cb/drivers/gpu/drm/amd/pm/amdgpu_dpm.cindex 68d2e80a673b..e732418a9558 100644 --- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c +++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c @@ -1547,3 +1547,26 @@ intamdgpu_dpm_get_dpm_clock_table(structamdgpu_device *adev,return ret; } + +int amdgpu_dpm_is_fan_operation_supported(struct amdgpu_device*adev,+ enum fan_operation_id id) +{ + const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs; + + switch (id) { + case FAN_CONTROL_MODE_RETRIEVING: + return pp_funcs->get_fan_control_mode ? 1 : 0; + case FAN_CONTROL_MODE_SETTING: + return pp_funcs->set_fan_control_mode ? 1 : 0; + case FAN_SPEED_PWM_RETRIEVING: + return pp_funcs->get_fan_speed_pwm ? 1 : 0; + case FAN_SPEED_PWM_SETTING: + return pp_funcs->set_fan_speed_pwm ? 1 : 0; + case FAN_SPEED_RPM_RETRIEVING: + return pp_funcs->get_fan_speed_rpm ? 1 : 0; + case FAN_SPEED_RPM_SETTING: + return pp_funcs->set_fan_speed_rpm ? 1 : 0; + default: + return 0; + } +} diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.cb/drivers/gpu/drm/amd/pm/amdgpu_pm.cindex d3eab245e0fe..57721750c51a 100644 --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c @@ -3263,15 +3263,15 @@ static umode_thwmon_attributes_visible(struct kobject *kobj,return 0; /* mask fan attributes if we have no bindings for this asic to expose*/- if (((amdgpu_dpm_get_fan_speed_pwm(adev, &speed) == -EINVAL)&&+ if ((!amdgpu_dpm_is_fan_operation_supported(adev,FAN_SPEED_PWM_RETRIEVING) && As per the current logic, it's really checking the hardware registers.[Quan, Evan] I probably should mention the "current" version you see nowis actually a regression introduced by the commit below:801771de0331 drm/amd/pm: do not expose power implementation detailstoamdgpu_pm.c The very early version(which works good) is something like below: - if (!is_support_sw_smu(adev)) { - /* mask fan attributes if we have no bindings for this asic to expose*/- if ((!adev->powerplay.pp_funcs->get_fan_speed_pwm && - attr == &sensor_dev_attr_pwm1.dev_attr.attr) || /* can'tquery fan */- (!adev->powerplay.pp_funcs->get_fan_control_mode && - attr == &sensor_dev_attr_pwm1_enable.dev_attr.attr)) /* can'tquery state */- effective_mode &= ~S_IRUGO; So, the changes here are really just back to old working version. It aims toprovide a quick fix for the failures reported by CQE. I see. Could you model on it based on below one? This is preferrable rather than introducing new API. drm/amdgpu/pm: Don't show pp_power_profile_mode for unsupported devices.[Quan, Evan] In fact, those piece of code from the mentioned change was updated as below } else if (DEVICE_ATTR_IS(pp_power_profile_mode)) { if (amdgpu_dpm_get_power_profile_mode(adev, NULL) == -EOPNOTSUPP) *states = ATTR_STATE_UNSUPPORTED; } As the access for adev->powerplay.pp_funcs from amdgpu_pm.c was forbidden after the pm cleanups. So, we have to rely on some (new)API in amdgpu_dpm.c to do those checks.
To be clear, the model is to use a dummy call to check if the API is implemented -
amdgpu_dpm_get_fan_speed_rpm(adev, NULL) == -EOPNOTSUPP amdgpu_dpm_set_fan_speed_rpm(adev, -1) == -EOPNOTSUPP That is better instead of adding another API and flags for each set/get API. Thanks, Lijo
A more proper way to cleanup all those attributes support checks stuff is to have a flag like "adev->pm.sysfs_attribtues_flags". It labels all those sysfs attributes supported on each ASIC. However, considering the ASICs involved and the difference between them, that may be not an easy job. BR EvanThanks, LijoFor ex: we could have some SKUs that have PMFW based fan control and for some other SKUs, AIBs could be having a different cooling solution which doesn't make use of PMFW.attr == &sensor_dev_attr_pwm1.dev_attr.attr) || /* can't queryfan */- ((amdgpu_dpm_get_fan_control_mode(adev, &speed) == -EOPNOTSUPP) &&+ (!amdgpu_dpm_is_fan_operation_supported(adev,FAN_CONTROL_MODE_RETRIEVING) &&attr == &sensor_dev_attr_pwm1_enable.dev_attr.attr)) /* can'tquery state */effective_mode &= ~S_IRUGO; - if (((amdgpu_dpm_set_fan_speed_pwm(adev, speed) == -EINVAL)&&+ if ((!amdgpu_dpm_is_fan_operation_supported(adev,FAN_SPEED_PWM_SETTING) &&attr == &sensor_dev_attr_pwm1.dev_attr.attr) || /* can'tmanage fan */- ((amdgpu_dpm_set_fan_control_mode(adev, speed) == -EOPNOTSUPP) &&+ (!amdgpu_dpm_is_fan_operation_supported(adev,FAN_CONTROL_MODE_SETTING) &&attr == &sensor_dev_attr_pwm1_enable.dev_attr.attr)) /* can'tmanage state */effective_mode &= ~S_IWUSR; @@ -3291,16 +3291,16 @@ static umode_thwmon_attributes_visible(struct kobject *kobj,return 0; /* hide max/min values if we can't both query and manage the fan */ - if (((amdgpu_dpm_set_fan_speed_pwm(adev, speed) == -EINVAL)&&- (amdgpu_dpm_get_fan_speed_pwm(adev, &speed) == -EINVAL)&&- (amdgpu_dpm_set_fan_speed_rpm(adev, speed) == -EINVAL)&&- (amdgpu_dpm_get_fan_speed_rpm(adev, &speed) == -EINVAL))&&+ if ((!amdgpu_dpm_is_fan_operation_supported(adev,FAN_SPEED_PWM_SETTING) &&+ !amdgpu_dpm_is_fan_operation_supported(adev,FAN_SPEED_PWM_RETRIEVING) &&+ !amdgpu_dpm_is_fan_operation_supported(adev,FAN_SPEED_RPM_SETTING) &&+ !amdgpu_dpm_is_fan_operation_supported(adev,FAN_SPEED_RPM_RETRIEVING)) && If this is the case, I think we should set pm.no_fan since nothing is possible.[Quan, Evan] Yep, I agree a more optimized version should be somethinglike that.Let's take this a quick solution and do further optimizations later. BR EvanThanks, Lijo(attr == &sensor_dev_attr_pwm1_max.dev_attr.attr || attr == &sensor_dev_attr_pwm1_min.dev_attr.attr)) return 0; - if ((amdgpu_dpm_set_fan_speed_rpm(adev, speed) == -EINVAL)&&- (amdgpu_dpm_get_fan_speed_rpm(adev, &speed) == -EINVAL)&&+ if ((!amdgpu_dpm_is_fan_operation_supported(adev,FAN_SPEED_RPM_SETTING) &&+ !amdgpu_dpm_is_fan_operation_supported(adev,FAN_SPEED_RPM_RETRIEVING)) &&(attr == &sensor_dev_attr_fan1_max.dev_attr.attr || attr == &sensor_dev_attr_fan1_min.dev_attr.attr)) return 0; diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.hb/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.hindex ba857ca75392..9e18151a3c46 100644 --- a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h +++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h @@ -330,6 +330,16 @@ struct amdgpu_pm { bool pp_force_state_enabled; }; +enum fan_operation_id +{ + FAN_CONTROL_MODE_RETRIEVING = 0, + FAN_CONTROL_MODE_SETTING = 1, + FAN_SPEED_PWM_RETRIEVING = 2, + FAN_SPEED_PWM_SETTING = 3, + FAN_SPEED_RPM_RETRIEVING = 4, + FAN_SPEED_RPM_SETTING = 5, +}; + u32 amdgpu_dpm_get_vblank_time(struct amdgpu_device *adev); int amdgpu_dpm_read_sensor(struct amdgpu_device *adev, enumamd_pp_sensors sensor,void *data, uint32_t *size); @@ -510,4 +520,6 @@enumpp_smu_statusamdgpu_dpm_get_uclk_dpm_states(struct amdgpu_device *adev,unsigned int *num_states); int amdgpu_dpm_get_dpm_clock_table(struct amdgpu_device *adev, struct dpm_clocks *clock_table); +int amdgpu_dpm_is_fan_operation_supported(struct amdgpu_device*adev,+ enum fan_operation_id id); #endif